The report isn't quite good news for Internet polling; it found large average errors across eight of the nine Web survey companies tested, in fact. Such samples are called "non-probability" samples since respondents are drawn through pools of volunteers who often receive rewards or other incentives for taking surveys. Because not every person in the population has a chance of being selected for the survey (volunteers self-select), they cannot be statistically projected to the population within a traditional margin of sampling error. Many news organizations, including The Post, avoid reporting on their results for this reason.
Most strikingly, Pew found its Internet-based American Trends Panel was not especially accurate on overall measures, despite the fact respondents were initially recruited through traditional "probability-based" telephone sampling. Equally surprising is that one of the volunteer Web panels outperformed all others. By a lot.
The study also contained some sobering results for Web surveys' ability to accurately represent African American and Hispanic respondents. That finding is significant, since one of the biggest ambitions of Web-sampled surveys is accurately representing a smaller demographic of the population, like a racial minority, that is prohibitively expensive to interview using traditional methods.
The chart on the right shows Pew's overall results. The identities of online survey vendors were shielded by denoting each with a letter, except for Pew's own American Trends Panel (ATP). The size of bars represents the average difference between results from the tested survey samples and 20 high-quality federal benchmark surveys on questions ranging from smoking to volunteering to voting.
All surveys exhibited sizable errors — beyond any expected margin of error — but Sample I clearly stood ahead of the pack with an average bias estimate of 5.8 percentage points, which is nearly 1.5 points lower than all other samples and two points lower than Pew's ATP. Most surveys erred by an average of seven to eight percentage points from the "gold standard" federal surveys, but three were off by an average of nine points or higher.
While the identity of Sample I is unknown, Pew offered some clues as to why it performed best. The sample "was notable in that it employed a relatively elaborate set of adjustments at both the sample selection and weighting stages," including on "several variables that researchers often study as survey outcomes, such as political ideology, political interest and internet usage."
Put another way: Sample I did not simply target and weight respondents to match demographics, such as age, sex and education, but also adjusted its sample to match estimates of political interest, partisan identity, ideology and others.
This approach is controversial, since political views are not fixed personal attributes but attitudes — though Pew's results suggest this approach may have advantages in reducing errors on other measures.
The less accurate non-probability surveys tended to have shorter interviewing periods and selected respondents only with respect to a few demographics, such as gender, age and region — both common methods in web surveys.
The study's findings mark a distinct break from past research findings that probability-based panels like Pew's ATP were consistently more accurate than non-probability Web surveys and that no Web survey sample was much more effective than others.
Beyond a probability-based sampling approach, Pew's ATP panel surveys also were conducted with larger sample sizes (1,857-3,278 interviews) than non-probability Web surveys (just over 1,000), a factor that should have helped reduce sampling error. The report dug into why the ATP struggled on overall population estimates:
Pew Research Center’s probability-based panel, the ATP, does not stand out in this study as consistently more accurate than the nonprobability samples, as its overall strong showing across most of the benchmark items is undermined by shortcomings on estimates related to civic engagement.It had the lowest average estimated bias on measures unrelated to civic engagement (4.1 percentage points), but was essentially tied with three other samples as having the largest bias on those types of questions (13.4 points).
Indeed, all surveys reported higher levels of citizen engagement than federal benchmark surveys, and Pew notes one potential cause of this bias in the ATP sample is the fact that panelists were recruited to join the panel at the end of a 20-minute survey focusing on political attitudes. Respondents who completed that survey may have been more politically engaged to start with, and those who chose to join the panel appear to have been even more politically interested.
The finding that political surveys overestimate engagement is not new, but it may be a reason Pew's results differ from past studies. A much-cited 2011 study by now-University of Texas assistant professor David Yeager and colleagues found probability-based surveys outperformed non-probability Web surveys in measuring demographics and other federal benchmarks, but the study did not examine the series of civic engagement measures analyzed by Pew.
Digging deeper, the Pew survey found most non-probability surveys struggled to represent views of Hispanic and African American adults, as well as young people and men. As the report notes, "across the nine nonprobability samples, the average estimated bias on benchmarked items was more than 10 percentage points for both Hispanics (15.1) and blacks (11.3). Sample I and the ATP are the only samples examined that have average benchmark deviations in the single digits for both of these subgroups."
Politically, the Web survey samples also tilted more Democratic than in traditional telephone surveys. Each sample found more self-identified Democrats than Republicans, though Democrats' party identification advantage ranged from 22 points in Web Sample E to five points in Sample B. Pew's telephone surveys over that period, which asked the same question, found Democrats with a six-point advantage (30 percent vs. 24 percent).
The Pew study does not have many short-term takeaways for political surveys in 2016 but does have big implications for accuracy of polling in the long run. For one, some non-probability online surveys appear capable of producing more accurate results through the use of more elaborate methods to make the sample representative across a range of characteristics, though the details of what makes them work are not well understood. Second, probability-based Web surveys like Pew's ATP do not guarantee to be the most accurate across all domains, especially areas that may have been motivations for joining the panel (like civic engagement).
The study's last and most vexing result is that a survey's accuracy on representing the population on some attributes does not consistently translate to others. That's a finding that is logical but poses challenges in assuming that any method successfully applied on one type of measures will also be successful on others.
But when it comes to the accuracy of Internet polling, data are still very much being gathered.