The study compares three types of polls: random digit dial (RDD) telephone, internet probability, and internet non-probability. The authors checked for accuracy of primary demographics used to weight the samples, secondary demographics not used to weight the sample, and answers to questions about the users daily health habits (which are similar to the secondary demographics). Estimates are compared to benchmarks with no post-stratification and then with post-stratification. The benchmarks are government surveys. There was one RDD, one probability internet, and seven non-probability internet polls.1) It should be no surprise that probability samples have more accurate primary demographics before weighting. There is no difference post-weighting, almost by definition. At this point I will stop commenting on the before weighting results, because they are not meaningful to any academic or practitioner report, because they would never be used in practice.2) With post-weighting, the secondary demographics and answers are slightly statistically significantly better for the probability samples. The average absolute percentage point error is 2.9 and 3.4 for the probability and ranges from 4.5 to 6.6 for non-probability with an average and median of 5.2. The largest errors for the probability were 9.0 and 8.4, but ranged from 10.0 to 17.8 for non-probability with an average of 13.5 and median of 13. This seems to be the strongest basis for Yeager et al.’s statement that probability samples are better.Small Issues: (a) the benchmark for secondary demographics is fine (ACS and CPS), but is it clear that the probability sample is not built for these demographics? (b) More questionable is the idea of ground truth for other answers coming from NHIS, all of the answers they were looking for were health. But, the NHIS is just a survey, which is likely to match the probability survey better. (c) Also, most standard methods for dealing with non-probability sample use a regression model prior to post-stratification. Non-probability samples may benefit more from the regression prior to post-stratification, due to the extremely bad selection issues in some demographic cells.Big Issues: (a) Even if I took their results at face value, the errors may be statistically significantly worse, but would they be worth it at 1% of the cost and a fraction of the time? Academic publications miss some key variables when they compare survey designs on accuracy alone: you must consider cost and speed. (b) The idea of showing non-weighted answers was to show something about the raw selection not being an issue in probability polling, despite its low response rate. While this paper was published in 2011, the surveys are from 2004-5. Unfortunately, in the time it took from survey to publication the response rate for RDD probability polling plummeted from 25% to 9%: http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys/.
April 11, 2014 at 10:00 AM EDT