As the U.S. blogosphere swirls with the arguments about the best ways to go about (or not) predicting election results, Justin Wolfers make a provocative argument today over at The Upshot in the wake of Thursday’s Scottish referendum. Wolfers claims that while the outcome was a “loss” for polling, the betting/prediction markets knew all along that the “No” side had a chance, but was more than likely to lose, and that this was basically what we observed. Moreover, Wolfers makes the additional point that asking people which side they thought would win – instead of aggregating up self-reports of intended votes – was a much better forecaster as well. The net result is that everyone calling the election close in the closing days was essentially looking at the wrong data to make that conclusion.
I’m not going to argue with Wolfers about his conclusion, as I’m sympathetic to both parts of his argument. But I do want to build on it by noting the following three points.
First, the polls did not “get it wrong” in terms of the outcome. The goal in predicting a referendum is to know which side is going to win. We’ve known for a long time that the closer you get to the day of an election, the better polls are at predicting the outcome. And the final polls did indeed predict a No win. Indeed, the polling firm Ipsos MORI posted the following Thursday during the vote:
Ipsos MORI released the very final poll of the referendum this morning (conducted for the Evening Standard) as voters were already heading for the polls. Conducted over a two day period that ended 24 hours later than the company’s poll for STV that was published last night, it adopted the same methodological tweaks to estimating the outcome used in that poll.
The poll put Yes on 45% and No on 50% while just 5% were classified as Don’t Knows. That represents a two point lower estimate of the Yes vote than in the company’s STV poll, while No’s tally is a point higher. When the Don’t Knows are excluded Yes are put on 47% (and No on 53%), two points lower than in the earlier poll. There is evidently no evidence of a last minute swing to Yes here, though the poll leaves our poll of polls unchanged on Yes 48%, No 52% – but with every single final poll putting No ahead. (emphasis added)
Note the last sentence: every poll in their poll of polls had “No” ahead at the end. This is important to remember as we do our post-hoc analysis of poll “failure” in Scotland. So what we are really discussing here is a difference of a few percentage points. This is not to say a few percentage points couldn’t have made the difference, and if the error was in the other way, then it very well could have led to the polls getting “it” (the referendum outcome) wrong.
So speaking of getting it wrong, here’s the output from a Web site called “Trendsmap” that was categorizing the prevalence of Yes and No tweets during the day and displaying them graphically. The blue line is the frequency of “Yes” tweets, and the redline the frequency of “No” tweets as the day went on:
Now, calling this “wrong” is maybe unfair to Trendsmap, as there is no indication on the Web site that they intended this data to be predictive. And as someone who works a lot with Twitter data these days, I can quickly point out a number of reasons why we wouldn’t expect this to be predictive (e.g., there is no indication there was any attempt to bound the tweets collected to people who lived in Scotland – indeed even their map was displaying the entire U.K. — their sentiment scoring looks remarkably simple, relying on the presence simply of one of three yes hashtags or one of three no hashtags, and there was no attempt to try to correct for the representative of Twitter users in Scotland vis a vis the overall population).
But what strikes me as potentially useful about the Twitter data is if we view it in combination with the polling data. Suppose someone had told you before the election that the final polls (No at 52 percent) was likely to be off by 3 percent, but they didn’t know in which direction. At that point, figuring out that direction would be crucially important, and could at least in part hinge on knowing which survey response (i.e., “Yes” or “No”) could be most likely to trigger a “Bradley Effect,” that is, an overestimating of support for one side because people didn’t want to admit they were voting the other way because they thought others (including here the pollster) might think badly of them. From this perspective, the Twitter data might prove useful, as it could show us which side had the popular enthusiasm, thus making it harder for people to admit to pollsters that they might not vote in that way, which in this case would be the “Yes” vote.
Of course it might also be the case that the Twitter data was simply providing more useful insight into the votes of younger people, in which case its prognosticating value looks a lot better, as this fascinating chart from the Wall Street Journal reveals:
Finally, I also want to note that we featured two sets of predictions regarding the Scottish Referendum here at The Monkey Cage that relied on poll aggregation techniques from Arkadiusz Wiśniowski of the University of Southampton. The first appeared on July 30th at a time when most forecasters were confidently calling for an easy No victory and had the provocative headline “Scottish independence vote is too close to call.” The post was ridiculed in the comments sections, but in the ensuing weeks we did indeed end up seeing the polls tighten with most people coming to believe that the referendum was in fact getting too close to call. However, just as that conclusion was beginning to take hold, Wiśniowski published a second piece on August 20th entitled “Odds of a Scottish Yes vote are fading fast.” At that point, the model showed the following:
Our forecast suggests the odds of a Yes vote for independence are fading fast. The forecast is now centered at 47 percent with the 95 percent predictive interval ranging from 44 percent to 51 percent. This differs substantially from our forecast approximately three weeks ago, reflecting this sharp movement in the polls. The probability that Yes campaign will obtain more than 50 percent of the vote is now only just above 5 percent.
From that perspective, looks perhaps not so much like a loss for polling as another win for poll aggregation methods?
[h/t to Per Dutton for Trendsmap Scotland page.]