Every few months, the Washington Post (in partnership with ABC News) offers the world a brand new bit of insight. This week, it was that Donald Trump leads the Republican field seeking the party's presidential nomination, a result that likely prompted more than a few people to wonder, How did that happen? We are here to answer that question in the most literal way possible.
You're probably aware that polling involves talking to a random group of people out in the world, but beyond that, it's likely a bit of a black box. We decided to ask Scott Clement, one half of the Post's polling team, to explain in aggressively minute detail how a poll moves from concept to final numbers.
Fix: Let's start at the beginning. How do you decide what to ask?
Clement: We decide what to ask based on topics that are interesting in the news and policies that are being proposed or candidates that are running for election. Deciding the topics usually isn't that difficult; we know what's interesting or what's in the news.
But asking polling questions is a complex art. There's a lot of science behind better ways to ask questions, but there's no clear definition where you can say, "That's a good question." There are no perfect questions but there are questions with different types of problems.
The two key things: Ask what you actually want to know about, and ask questions that are simply worded and can be understood by everybody.
If you want to know presidential approval, this is simple. We don't say, "Do you approve of the job the president is doing?," we say, "Do you approve or disapprove of the job the president is doing?" because we don't want to emphasize one answer or the other. You want to offer options that represent the basic choice that's at hand. Sometimes that's a simple choice: approval, disapproval. Sometimes it may be a range, like, "How confident do you feel that an Iran nuclear deal will actually work?" You can be extremely confident, you can be not confident at all, you can be somewhere in the middle. We know from our polling that people tend to the negative side of that scale.
Questions need to be understood by everybody that is answering the survey. They need to be understood by people that are Ph.D.s, and people that have less than a high school diploma. That means that there's a real priority on simplicity -- not using jargon, complex words or things that can be confusing to people.
The other big challenge in asking questions is being careful about how questions are ordered in a survey. If I ask you about five different things that might go wrong if America's auto industry gets in trouble, and then I ask you whether you think the federal government should offer loans to the auto industry, you may give me a different answer than if I'd just asked you simply about loans to the auto industry.
If you want to ask about 20 different things on a survey, you can't ask all of them first. There's always a potential that the questions you ask earlier could influence the answers you get to later questions.
Fix: There are set questions that we always ask -- presidential approval, direction of the country. How often do those change?
Clement: There are a few reasons to ask questions repeatedly. One, they're used for particularly important questions. How popular a president is is at the center of the political world. But the other is that they help you gauge lots of other things.
For instance, people may overwhelmingly support some form of legal status for immigrants, but the approval of President Obama is a lot lower. That tells you something about President Obama's approval: It's not tied to the popularity of immigration reform. So it provides a useful barometer.
When you want to track change over time, it's very important that you ask questions in an identical manner or an almost identical manner. Surveys are not very precise measures. Typical national surveys have a three point margin of error. You're not going to capture a two percent change in attitudes between poll to poll. What you can do over time by asking again and again and again the same questions though is see those slow changes pile up into something bigger.
Asking about approval helps us ground our political conventional wisdom in something more rigorous than, "Oh, Obama looks good today." It tells us something more real.
Fix: Who's the pollster we use, and what's their role?
Clement: We work with a couple of different firms. One is Abt SRBI.
They're a field house. They hire and train professional telephone interviewers. They take the questions that we design with ABC News and put it into a software program that interviewers can read while they're on the phone and enter answers into. They help inform how many cell phones and landlines we call and they go out and purchase that sample from a provider.
One of the things about a national political telephone survey is it's a lot of activity over a very short period of time. You have over 1,000 completed interviews over a four-day period, so that needs a very large staff of professionals to make these phone calls and conduct these interviews.
And then they deliver the data, of course.
The Fix: How do you decide how many people to poll?
Clement: The sample size that we typically interview is 1,000. It provides some consistency across surveys when we do them of the same size.
The other benefit is it allows you to examine attitudes among most of the major political demographic groups. You can look at conservative Republicans. You can look at liberal Democrats. You can look at whites with and without college degrees. Typically, you can look at African-Americans and Hispanics -- with a fairly large margin of sampling error but still over 100 interviews, which is our typical threshold.
The Fix: So the pollster is ready to go. How does the actual process work?
Clement: You start off with a big sample of telephone numbers. It includes all of the area codes and exchanges [the first three numbers] in the country. You then draw a systematic random sample from that. You pick a random starting point and say, I want 5,000 phone numbers to be drawn from here. You figure out how long the list is, divide it by 5,000, and draw your random sample from each one. Those numbers go into a file that has all the numbers you could potentially call for a survey. Then those numbers are organized by region.
Meanwhile, the interviewers have gone through a training. They've studied the questionnaire as a group. They've been working through the questions -- how to pronounce all the candidates' names, read things clearly.
They sit down at their desk and one of two things happens. If they're calling landline numbers, there's an automated dialer that's going to feed them calls. It will keep calling phone numbers until they get a pick-up, when it will route it to an interviewer.
For a cell phone, because of a federal regulation, they have to be hand-dialed. The interviewer will have a number, hand-dial it and wait for someone to pick up. If someone picks up, they ask the person if they're 18 or over [then] they'll roll them into the survey.
The cost tends to be about twice as much to complete a cell-phone interview. At the same time, the cell-phone-only population has increased dramatically, so polls that want to obtain representative samples of the public have generally moved to calling more and more cell phones. We've upped again our sampling of cell phones in our latest survey to include 65 percent of surveys on cell phones. Back in 2007, we were calling all landlines.
In every survey, we translate the questionnaire [in]to Spanish. If we reach people who have a language difficulty, we typically call back to complete with a Spanish speaking interviewer. In 2013, we switched over to including Spanish language in every survey. It's definitely becoming very important for the survey to include Spanish-language interviewing because a significant portion of the audience doesn't speak English.
That describes the first time someone tries to call. Everybody who they try to call and get a voice message or something that indicates a live number, they'll call again. They'll call them several times over the field period [the days the poll is "in the field," meaning when interviewers are making calls] in order to try to complete an interview. The main principle behind that is that we understand some people are easier to get on the phone than others. We don't want our sample to be biased toward people who are especially available to take a survey. We want people who are also difficult to reach.
Survey respondents tend to be a little bit more engaged, more likely to be registered to vote than other people. But generally, we want to give people multiple opportunities to be reached so that your chance of being surveyed is the same if you went out for dinner on Friday night as if you went out for dinner on Saturday night, or if you work in the day versus if you don't.
Fix: Once the calls are done, how are they analyzed statistically?
Clement: Our surveys are weighted by a number of different demographic parameters to the latest available estimates from the current population survey by the Census Bureau.
The types of things we're weighting for are to ensure that the final sample matches population estimates on sex, on age, on education, on race and on ethnicity. There's also weighting to ensure that the proportion of the sample who has certain access to phones matches estimates from the National Health Interview Survey, which is, interestingly, the federal government benchmark for how many people are landline only or have access to cell and landlines.
The pollster processes that and delivers a file that includes the weight variable.
Weighting is good and surveys should do it! Surveys rarely come back as a perfect representation of the general public. We're lucky in the United States to have excellent data on the demographics of the population: age, gender, where they live, race and ethnicity. To the extent that surveys can correct for biases in their sample, that's a good thing for surveys.
Fix: My understanding of how weighting works is: You get your data back and it has 54 percent women and 46 percent men, whereas the actual population of the country is, I don't know, 52 percent women. So you apply that percentage difference to the data so that it then matches. Is that a good description of it?
Clement: Yes. The only added layer is explaining how this is done with multiple factors. It's called "raking," a term which has never been that intuitive to me.
Say we just have sex and age. I have men and women, and five different age groups. I have too many women, so I'm going to give women a weight of something below one and men a weight of something above one. Then I do the same thing for age. For each age group, if they're underrepresented or overrepresented, they get a weight of slightly higher than one or slightly lower than one. Then I go back to sex and do the same again. And then I go back to age, and do the same again. Eventually they converge.
The main element on margin of sampling error [what we usually just refer to as margin of error] is how much you would expect things to deviate if you had done the same sample 100 times or 1,000 times. That's just one form of error in a survey. We fixate on it because it's something statistical that we can wrap our heads around -- "plus-or-minus three" is very nice and neat -- but there are many other types of errors that are difficult or impossible to quantify.
When you go out and weight data, what you're doing is you're trying to make your measure more precise. You're trying to correct for biases in your initial sample. But when you weight, you're also increasing random noise or variance. So there's a trade-off between weighting by too many things and increasing random noise. That's why the margin of sampling error is higher when you weight data. That's why some margins of sampling error are higher than they were 15 years ago. They're accepting a higher margin of sampling error in order to correct for some of those factors.
We need to understand what things are demographics and what things aren't demographics. Most national public polls that you will see are measuring party identification as an attitude. There's not a Census measurement for party identification, nor should there be because it's something that changes over time. We should expect polls to vary both in their ideological composition and their partisan composition. Not drastically, but we should expect polls to be different from one another.
Fix: What other analysis do we do?
Clement: Our analysis when we get the data is driven by what's interesting. These polls are designed to first show what the general public thinks on a particular issue. That's pretty plain and easy to see; it doesn't require a great deal of statistical analysis.
The other element of analysis that we look at is how each question breaks down by a variety of different demographic groups. Basic questions are: This percentage of people supports a change to Social Security policy. Do Democrats and Republicans have strongly differing views on this? Typically we're used to seeing strongly differing views, so it's surprising when they aren't different. That's the kind of thing we're looking for.
If you have a theory that maybe people of a certain demographic characteristic have a really unique opinion on some issue, and you conduct your statistical analysis and find that being in that demographic groups doesn't make a difference, that could warn you off of writing a story. Or it could confirm that, in fact, this factor is really important and you should feel confident about that.
So there you go. Start to finish.
If you want to conduct your own national survey, by the way, ORC will charge you $1,000 to add a question to its regular poll of 1,000 Americans. Maybe see if you can figure out why Trump is polling so well.
This interview has been edited and ordered for clarity. Two erroneous transcriptions were updated, and the year that the Post began including Spanish language calls was corrected.