Once upon a time, the saying that you could prove anything with statistics was a sort of idiomatic way of waving away unhelpful information. The loose implication was that numbers existed allowing for any point to be proven, after a bit of massaging and omission.
Now, though, it's actually true. Any idiot can run a "poll" on the Internet or on social media and generate official-sounding percentage. Here, for example, is a poll on social media from one such idiot: me.
Indulge me: What is 1+1?
— Philip Bump (@pbump) September 9, 2016
Here's a headline for you: 28 percent of Americans don't know basic arithmetic. A shocking indictment of the country's educational system. Down with Common Core and the new math and so on.
Or maybe, you know, this is a stupid poll answered by a nonscientific group of people, some of whom were trying to be "funny." (Please mentally flag this last point. We'll be revisiting it.) Maybe such polls, open to anyone with a Twitter account, don't actually tell us anything about the world. Maybe they don't really prove anything at all, and those who try to cite such polls to ma...
— Donald J. Trump (@realDonaldTrump) September 9, 2016
We are 60 days from the November elections, and I guess it has finally come time to address this directly. Not all surveys are useful guides to the outcome of the election or public opinion. And instead of pointing you to times that I've written about various types of polling and surveys, I've decided to just compile them all in one place.
The analysis below uses a chart-emoji scale ranking each survey method or data set from least to most trustworthy. Zero chart emoji means it is trash. Ten chart emoji is the gold standard.
Zero chart emoji.
I assume you were paying attention above.
Twitter polls are really a perfect example of misleading numbers. In Donald Trump's tweet, for example, there are three numbers: 37, 63 and 81,577. That last one is the number of people that weighed in. Over 81,000! Surely that's more accurate than a poll of 600 people, right?
I will answer that question with another question. If you were hiring someone for a job, would you rather have 10 letters of recommendation from people who could vouch for them or 1,000 letters collected from random people at a supermarket? Sure, one shows hundreds of people that back the person — but the smaller group is far more representative of the information you're looking for. It's fewer people, but it's people who can actually provide some insight.
Surveys hosted at websites
Zero chart emoji.
A good rule of thumb for the utility of a poll is whether you can email it to people to encourage them to weigh in, too.
Let's say there's a poll at The Washington Post's website. Or, better, let's host a poll at our website, like so.
You can send this article to literally anyone and get them to take the poll. It's basic organizing: Skew the results by getting more of your allies to weigh in. But if it can be skewed, it isn't scientific. If Trump tweeted this article and encouraged people to take the poll and if Hillary Clinton didn't, it's safe to say that Trump would do better. That doesn't mean he'll do better on Election Day.
Oh, I know where you're headed by the way: But doesn't taking action indicate that they're more likely to vote? Allow me to cut you off at the pass.
Zero chart emoji.
Last month, Fox News's Eric Bolling argued that Trump's crowd size margin vs. Clinton's suggested he would win. "Here's why polls really shouldn't matter or shouldn't ever matter," Bolling said. "You pick up the phone and you say, 'Who are you going to vote for?' That person on the other end of the phone says, 'Well, I'm going to vote for Hillary Clinton.' They're not out there voting." People at rallies, though, had actually gotten up and left the house.
This is a good example of bad sampling (being limited, for example, to people who live near the rally site) and of the ability to organize to skew the results. In this case, we can add in that just because 20,000 people went to a rally doesn't mean that Trump will win a state where 5 million people vote. It simply means he can probably count on 20,000 votes — or as many votes as there are adults who are registered in the audience.
Polls from advertising firms
One chart emoji.
Recognizing that polls are a good way to get headlines, marketing firms have seized on conducting goofy polls (like, say, "What is 1+1") to make a point that can generate some media attention. Think of something like a real estate dot-com that releases data about how Trump voters are more likely to live in ranch houses and Clinton supporters are more likely to live in apartments. I just made that up, but I'm assuming such a scenario seems familiar.
As a general rule, those marketers are a lot less interested in scientific sampling and accepted poll methodologies than they are in getting a result that will get reporters' attention.
Internal campaign or party polls
Five chart emoji.
And continuing with that theme: Campaigns often do a lot of polling. The goal of polling is generally to identify demographic groups that are worth targeting and to test messages with those audiences. When we talk about a candidate's policy positions being poll-tested, this is what we mean.
Sometimes, particularly when people think a candidate is losing badly, campaigns will release data from those polls that show how well they're doing. The problem is that they normally don't provide context to how and when the question was asked. For example, are the numbers provided ones that resulted after the campaign tried out its campaign message? If so, they're suspect — since the campaign's goal is to make sure people hear that message, and it may not be successful.
Polls that are conducted over the Internet
Eight chart emoji.
There are a lot of ways to conduct a poll. Some pollsters have actual people call people at home or on their cellphones to ask questions ("live-caller" polls). Others have a prerecorded human voice that prompts people to push buttons on their phones. Others select a panel of participants to answer questions over the Internet. When people talk about "online polls," that's generally what they mean: polls that are conducted over the Internet with a small group of people.
Notice how that's different from a poll that is found on a website. You can't simply have anyone weigh in by passing a link around. It's a controlled sample of people that is determined through statistical analysis.
We don't generally include online polls in our analysis. "The Post has generally avoided citing results from non-probability Internet-based surveys such as SurveyMonkey, as it is impossible to draw a random sample of Internet users, and random selection is a widely accepted standard in drawing representative samples of any population," our pollster Scott Clement wrote this week. Put another way: The concern is that such polls may not include only letters of recommendation from those 10 people who know you.
Clement was writing that, though, to explain why we partnered with SurveyMonkey for the 50-state poll we recently completed. Clement's team reviewed results and found that SurveyMonkey's results were "broadly in line with election results, other polling benchmarks and our own trusted cellular and landline phone surveys."
That said, Clement's review only included SurveyMonkey. Clement explains that studies assessing online polls have found some variance in accuracy. So: Eight chart emoji.
Polls that use live interviewers
Nine chart emoji.
Live-caller polls are still the gold standard in surveying the electorate. We outlined exactly how The Post's process works, if you're curious, but let's dispense with three incorrect notions right out of the gates.
1. You can't tell anything about national opinion from a group of 600 people! Yes, you can. When you are going to go swimming, do you have to take the temperature of every location in the pool before you jump in? No, you dip a toe, because the temperature is almost always evenly distributed throughout the pool. Or my favorite analogy: A doctor doesn't need to remove all your blood to take a test. He only needs a vial.
Adding more respondents to the poll doesn't necessarily increase your margin of error. Last fall, I made a chart showing the geometric progression of margins of error depending on sample size.
If you have a poll of 100 people, a survey of 600 people will be substantially more accurate. A poll of 1,000 people, though, won't be that much more accurate than one of 600 — assuming your sample is accurate. Because, again, it's the sample that matters. It's the 10 people that provide the letter.
2. Those polls exclude cellphones! No, they don't. Most live-caller polling includes cellphone users. The Post's polls do.
3. People are intentionally lying to skew the results! or No one ever polled me! So the second part there is like a water molecule at the bottom of the deep end of the poll complaining that no toe ever touched it, so how could anyone know the temperature of the water.
The first point gets to what I asked you to mentally flag way back at the beginning of this thing. Yes, some people who follow me on Twitter made a joke out of a joke poll. But that's a very different thing than isolating 600 people out of America's 300-plus million among whom a significant number are willing to lie to the pollster. Most people never do get a call from a pollster! When they get that call, they generally answer honestly, because why wouldn't you? If you're a Trump supporter, why would you lie and say you backed Clinton? It simply doesn't make any sense.
Ten chart emoji.
Polling averages are averages of polls, shockingly enough. I tend to use the average from RealClearPolitics, but the one from Huffington Post Pollster is also a common point of reference. (It includes more online pollsters.)
The beauty of a polling average is that it helps avoid the urge to cherry-pick results. Sure, the recent poll from CNN-ORC showed Trump with a slight lead, but a number of others have shown Clinton up. The average right now, according to RealClearPolitics, is that Clinton leads by about three points. That's lower than it has been, but it's still a lead. Is the average right and the CNN-ORC poll wrong? It's not quite that simple, getting into questions of the composition of the sample and the fact that it's sort of unclear in a stoner-talking-about-his-hands way what it means to lead in a poll in September relative to an election in November.
But for heaven's sake, look how much progress we've made. This whole thing began with our asserting that a quarter of Americans don't know what 1+1 equals, and we landed at a debate over the nature of polling averages. That's progress.
Plus look at the results of our poll about the presidential campaign! What a surprising outcome!