Polling has had a bumpy few years after polls in several states overestimated Hillary Clinton’s support in 2016, tipping a line of dominoes that led observers to think she was a heavy favorite to be elected president that year. As you know, she wasn’t. Instead, Donald Trump squeaked past Clinton in Wisconsin, Pennsylvania and Michigan, earning enough electoral votes to win the presidency.
Over the course of his campaign, Trump had regularly disparaged polling, pretty obviously because the polling showed him trailing. He would insist that people weren’t responding to pollsters or that pollsters were getting it wrong. He established, in short, an argument in advance that polls couldn’t been trusted. That some of those polls ended up missing the result, with dramatic effect, seems only to have solidified perceptions of polling among Americans, particularly Americans who voted for Trump.
This is unfortunate, largely because polling is broadly trustworthy and informative. But with every new poll, such as one published Thursday by Marist, there’s a chorus of skepticism about what the results show.
We have dubbed this article an FAQ — responses to frequently asked questions — but, in the modern style of political discourse, it’s really meant to be an FOC — a debunking of frequently offered complaints. Those who deal with polling regularly, like me, hear thematically consistent complaints and disparagements about the credibility of polling. This article is meant to answer as many of those complaints as possible.
How does polling work?
There are a few types of polls. The traditional form of polling is a live-caller poll, where an interviewer calls someone on a landline or a cellphone, and asks questions. There are also polls in which a random number is dialed, and the person who answers listens to automated prompts and presses certain numbers to record a response. And there are polls that are conducted online with preselected panels of respondents (which we’ll discuss below).
There are also garbage surveys in which websites ask visitors to click on a choice from a list. This the sort of thing that Trump’s former attorney Michael Cohen reportedly tried to game in 2015. We’ll talk more about these, too.
We wrote about The Washington Post’s live-caller polling process in great detail in 2015. It provides probably the best, most thorough answer to this question. But let’s walk through the short version.
A polling company determines a big pool of phone numbers from across the country. Interviewers sit down at computers that begin calling numbers at random. The number of people called varies, but generally it is between 600 and 1,000. To speak to that many people, thousands more people are tried, some of whom refuse to participate, some of whom may not meet the requirements of that particular poll (such as being a registered voter). When the New York Times ran live polls that were tracked online during the 2018 midterms, they ended up making 2.8 million calls for 96 distinct polls. That’s an average of 29,000 calls per poll — each of which ended up having between 400 and 600 respondents.
Are the responses from people who answer the questions reported directly?
Once an interviewer connects with a respondent, the responses from the call are logged into a database. Those responses are tracked to ensure that the people who are answering the questions represent the region being polled. In other words, if you’re doing a national poll, you’re going to want a mix of white, black and Hispanic respondents that looks generally the way the country’s population does — as well as the right mix by age, gender and education.
There’s no way to do this perfectly, so pollsters use weighting — applying statistical analysis to the existing results — to adjust the results to match the population. If, for example, a poll included a lower percentage of black respondents than there are nationally, the responses from those respondents would be weighted slightly more to make up the difference. It’s just math, basically.
How can be we be confident that weighting works? Well, because polling is broadly accurate. But there are times when polls are off the mark, and it’s because the pollsters' estimates of the proper population ended up being incorrect — sometimes because the people who turned out to vote weren’t the ones the pollsters expected.
In 2016, the Times gave four pollsters a set of data from a poll and each pollster made assumptions about the electorate, returning four different polls results. On net, though, the differences covered a five-point spread.
Weighting requires talking to the right population of people so that you have enough responses from certain populations to be able to accurately represent their views. Good polls that are conducted over the Internet often use predetermined panels so that the pollsters understand who is offering responses and can weight accordingly.
This is also why click-in online polls are junk; there’s no real way to control for who’s responding, even if you wanted to weight the results, which those polls don’t. Those polls are like releasing a national poll in which the only phone calls that are made are to 600 white people who live in Manhattan. Or, more accurately, like writing a phone number on a wall in San Diego and logging the results of anyone who calls in.
Here is something to read — which polls are worth paying attention to and which aren’t.
How can pollsters estimate how the country feels after talking to only 1,000 people?
The more people you ask, the more accurate your results. To a point.
It seems counterintuitive that you could talk to 600 or even 1,000 people and get a good sense for what a country of 320 million people are thinking. Until you look at it another way: When you go to the doctor, does she have to remove all of your blood to conduct tests, or does she just take a sample?
The answer, assuming you are alive right now, is the latter. She takes a representative sample of your blood and runs the tests. That’s what pollsters do with polls.
A poll’s margin of error depends largely on the number of people you talk to. If you talk to 100 people for a national poll, you’re looking at a nine-percentage-point margin of error (using the stipulations indicated on the graph below). Crank it up to 600, and you’re down to four points. As you add respondents, though, the reduction on margin of error diminishes. At 1,000, it’s 3.1 points. Add another 1,000 people? Down to 2.2 points.
Because it takes a lot of calls — and time and hourly wages for the interviewers — to get those 1,000 responses, there’s not really much reason to interview 20,000 people for the poll. The margin of error won’t drop that much.
Why do polls include so many more Democrats than Republicans?
People looking to criticize polls often focus on the sample size (discussed above) or on the party composition of the respondents. How can a poll be accurate, one common complaint goes, if more of the respondents are Democrats than Republicans?
For a few possible reasons. For one thing, more Americans currently identify as Democrats than as Republicans, according to Gallup polling. So a national poll should reflect that. For another, the polls are weighted to make sure that the responses match as closely as possible the population being surveyed.
Just because a poll includes more Democrats than Republicans (or vice versa) means neither that it’s inaccurate nor that it’s biased. And, again, the proof is in the pudding: Polling is generally accurate. (See the first question in the next section.)
Why do pollsters sometimes oversample in polls? Isn’t this an attempt to skew the results?
One of the many mini controversies in 2016 was a conservative website reporting that a particular poll had “oversampled” a certain population in its poll, which, the site argued, showed how pollsters skewed poll results for Democrats.
The poll in question oversampled Native Americans. Why? Because Native Americans are a small part of the population. The pollster was asked to “oversample” from that group — that is, get more Native American people into the pool of respondents — so that there were enough Native Americans in the poll to get statistically useful information about them.
Just as polls that include more respondents have lower margins of error, so do poll results that focus on a particular subgroup. If you have responses from 50 black people in your poll, the margin of error for results among black respondents will be huge — and therefore not very useful. If you oversample black respondents so that you have 300 respondents who are black, you can learn more about opinions from black voters. Nor do you then skew the results of the poll overall: Pollsters simply weight the results to account for that overrepresentation.
How do exit polls work?
That’s a whole other question, which we’ve explored in detail before. The short version is that, in traditional exit polling, people fill out surveys after having voted, generally right at the polling place, in which they provide demographic information and answer questions about their votes. Those polling places used in exit polling are selected to facilitate the weighting of the results — meaning that they are chosen to be a representative swath of the region being polled.
The key thing to remember about exit polls is that those responses are eventually weighted with the results of the election. They aren’t predictive (though they’re used for predictions); they’re meant to offer insights about who voted. So early exit polls are re-weighted as more election results come in — precisely because the results are establishing the proper population against which the surveys should be weighted.
Put another way: If the initial results of the poll show Candidate X up by five points but the final result shows that Candidate X lost by three, the exit pollsters will know that their results were too heavily weighted for the types of voters that chose Candidate X and will then re-weight the data to match how the results looked.
Why haven’t I ever been called for a poll?
This is a common complaint, sometimes offered as a way to disparage the idea that polls are even conducted. There’s this theory out there that media outlets just make up numbers, which is dumb for a variety of reasons, including that we often rely on those polls to inform our coverage (where we go and what we ask people about) and because we spend tons of money doing so. And, for the 17th time in this article: The polls are broadly right.
So why haven’t you been called? Well, there are a lot of people in the United States, and maybe your phone number has never come up in a random call list. Or maybe you’re not registered to vote, which will decrease the odds you’re on a call list for a political poll. Or maybe you were called and you saw that the caller was SSRS Surveys on your caller ID, and you rolled your eyes and sent it to voice mail. Or maybe you weren’t home. Or maybe you answered and then thought it was a spam call, and you immediately hung up.
Lots of people have been called for polls. Ask around.
So that’s how polls work. Now let’s talk about the results of the polling.
Aren’t polls consistently wrong and unreliable?
Finally! We get to this question.
No. They are not.
Scott Clement, The Post’s polling director, wrote about the accuracy of polls last summer, pointing to two analyses that showed how broadly accurate political polling has been in recent years. Here’s a report on political polling to which Clement contributed, if you want to get wonky.
Are polls sometimes wrong? Yes, but there are several ways to look at this. Sometimes polls suggest a result that’s way off the mark, as in Michigan during the Democratic primary in 2016. Sometimes a poll predicts the wrong winner. Sometimes a poll predicts the wrong actual distribution of support. Let’s consider those separately.
The poll is way off. The best recent example of this comes from 2016, but not November 2016. In March that year, polling suggested a Clinton landslide in the Democratic primary in Michigan — a primary that Sen. Bernie Sanders (I-Vt.) narrowly won. Why was the poll wrong? In large part because this was an unusual election that followed an unusual election in 2008. Pollsters look at past elections to determine who’s likely to vote (which they then use to weight survey results), but in 2008 there was no competitive primary in the state. So it had been more than a decade since a competitive primary in Michigan, and the pollsters' models were off.
These errors happen. That’s one reason it’s good to look at a lot of polls of the same election or location instead of one or two. Outlier polls — those with results that differ widely from the consensus — are easier to spot when there are a lot of similar polls with which to compare them. (There weren’t a lot of polls in Michigan in the 2016 primary.)
Here — we made an interactive about that!
The poll predicts the wrong winner. This will happen! If you have a poll with a margin of error of four points that shows Candidate A with a one-point lead, there are pretty good odds that Candidate B could end up winning by two points. The problem here isn’t the poll — it’s in how you or the media understand the poll. We (meaning I) often seize on close results because they’re interesting, with small differences between candidates or between views on an issue. Clement or his colleague Emily Guskin, our polling analyst, will gently remind me that such differences are not significant — that a 51-to-49 result in a poll generally means, in a statistical sense, that a 49-to-51 result is about as likely.
The poll gets the percentages wrong but the result right. We’re so accustomed to using polls to predict winners that we are more comfortable with a poll that calls the right winner but misses the margin by 15 points than a poll like that 51-to-49 one I just described. But the 15-points-off poll is much worse, since the result is so far outside the margin of error.
An extreme example of this bias for calling the winner came with the Los Angeles Times-USC poll in the 2016 election. It used an online panel and consistently showed Trump winning the election by several percentage points. He ended up losing the popular vote, which is what that poll was trying to estimate, but since he won the presidency, this poll has been cited (including by the Los Angeles Times) as being one of the more accurate polls of the cycle. It was one of the least accurate polls, because it was way off the mark on what it was trying to measure.
There were, again, big misses in states that Trump ended up narrowly winning, misses that meant expectations about the results of the presidential race were off the mark. But in most states and nationally, the polls were on the mark.
Notice what we’ve done here! We’ve spent a bunch of time looking at the exceptions to when polls were accurate. That’s what our brains do, too: We ignore accurate poll results in favor of the times we were surprised by the result. You don’t remember the times you drove to work without incident; you remember the time you rear-ended someone outside the Environmental Protection Agency and totaled your wife’s car. (Me, 2011.) That’s one of the biases that influence how people view polls.
Now to the other big question.
Why did polls show Clinton winning 98 percent of the vote in 2016?
One of the things that’s made political reporting more interesting in recent years is the increase in poll averages, which (as with the Michigan example above) use aggregates of poll results to come up with an average. In 2016, the RealClearPolitics polling average, which includes most polls, ended up giving Clinton a 3.3-point advantage. She won the popular vote by 2.1 points — meaning that the average was 1.2 points off the mark. (To Clement’s point — remarkably accurate.)
There’s another tool that’s been popular recently, too — election forecasting. FiveThirtyEight’s Nate Silver rose to national attention in 2012 when he applied poll results to the 50 states and predicted with remarkable accuracy just how likely President Barack Obama’s reelection was.
At the end of the 2016 election, FiveThirtyEight’s model showed Clinton with a 71.4 percent chance of winning. The way it worked was straightforward: They ran thousands of versions of the election every day, with different results each time based on polling being slightly off in different ways in different states. In 71.4 percent of the results, Clinton won. In 28.6 percent of them, she didn’t.
Before the midterms, we talked a bit about probability, given the amount of discussion at that point about the odds of the Democrats retaking the House or Senate. I made this tool, which visualizes various odds. Give it a shot.
You have $1,000. How much would you bet on a red square being randomly picked out of the box below?
Now here’s one with 71 percent odds. How often does the result land on red?
You have $1,000. How much would you bet on a red square being randomly picked out of the box below?
Not as often as you might think. But people saw “71.4 percent” and thought “gonna happen.”
Well, that wasn’t the whole problem. The other problem is that other models at other sites gave Clinton much higher odds to win. The New York Times' model gave her a much more substantial 85 percent chance of winning. This is still not a sure thing, but it’s closer. In late October, weeks before the election, the percentage in the Times' model topped 90 percent — helping to contribute to this idea that “the polls” said Clinton was going to win with 98 percent of the vote, or whatever.
No poll showed her winning in a landslide. No poll showed her with 98 percent odds of winning; that’s not what polls do. These were instead estimates of the odds of something happening. As the red-box tool above shows, even in a situation where there is a probability of 99 percent that something happens, 1 out of 100 times that thing won’t happen.
But much more importantly: The polls were broadly accurate. They said she would get more votes. She got more votes — by about the margin that was expected.
Not only that, but the other recent contests that have involved a lot of polling were similarly accurate. Trump led the Republican primary field for months. Despite pundits' predictions that he would lose despite those polls, he didn’t. The 2018 polling was quite accurate, as Clement wrote in November. It was so good, in fact, that Democrats felt bummed about the blue wave for a while because the gains in the House were . . . exactly as predicted. It’s the difference between watching your favorite team win by 30 points and being told your team is going to win by 30 points right before the game starts.
Corollary: Why were The Post’s polls so wrong?
They weren’t. Polls change over time as voters' opinions change. Our last poll with our partners at ABC News had Clinton winning the popular vote by four points. She won by 2.1 points.
Weren’t the polls wrong because of voter fraud?
This assertion (which crops up on occasion) is working backward. This isn’t an assertion about polling; it’s an assertion about voter fraud. It’s a claim that presumes polls were wrong and uses that inaccuracy to claim that, therefore, there exists voter fraud that accounts for that discrepancy.
There is no evidence, even more than two years later, that there was any wide-scale in-person voter fraud in the 2016 elections — or in any election.
Trump is more popular than polls indicate, especially with black Americans.
This claim is made most publicly by President Trump.
What’s interesting about it is that Trump didn’t really make this claim during the 2016 primary, when he led in the polls. It was only when he trailed in general-election polling and now that his approval ratings aren’t that great that he insists the polls are off the mark. As noted above, the polls have been accurate despite his assertions.
Trump also likes to cherry-pick polls that show what he wants to see — such as the consistently generous polls from Rasmussen Reports that several times have shown him with much more robust support from black Americans than other pollsters. Rasmussen’s midterm polling, though, was far from the mark. The RealClearPolitics polling average of polls at the end of the election estimated the Democrats would win seven percentage points more of the House vote. They won eight points more.
Rasmussen’s last poll had the Republicans winning that vote.
Trump has also claimed (several times) that his support with black Americans has doubled. It hasn’t.
Corollary: Trump supporters aren’t admitting they support Trump, and that affects the numbers.
There’s no evidence that Trump supporters defiantly refuse to talk to pollsters to any significant extent. It’s generally another way of Trump (and his supporters) trying to explain his soft approval rating. There was some discussion in the 2016 election that Trump supporters might feel social pressure to deny supporting him when speaking to a live interviewer, but that idea has been refuted.
If you’re a Trump supporter who wants to show support for the president but were for some reason inclined not to talk to a pollster, answer the poll! That’s the best way to boost his approval numbers.
So there’s your FAQ. Er, your FOC.
One last note. Whether you believe in the accuracy of polls matters to me personally only to the extent that I find bad and misinformed rhetoric frustrating. If you don’t want to believe in the accuracy of polls, don’t. Your not accepting the reliability of polls doesn’t decrease that reliability any more than your skepticism about gravity will suddenly result in your floating into the air. Polls are accurate and — with the repeatedly noted exceptions of polls in some key states — were broadly accurate in 2016.
Oh, and one more question:
Isn’t ‘horse race’ polling bad for America?
Not even getting into this.