But we’re not seeing that in the polls. Current polling methods don’t accurately sample minority voters. That failure particularly skews our understanding of the Democratic electorate — and limits the candidates who will be on tonight’s debate stage.
Democratic primary voters are nearly evenly split between people of color and whites
In 2016, the Democratic primary electorate was 42 percent nonwhite, according to the American National Election Studies. In 2020, it should be about 46 percent nonwhite. That’s true, in part, because of younger nonwhites registering to vote in increasing numbers — and that group is more likely to prefer Democrats.
Between 2014 and 2018, people of color grew their share of the electorate by 3 percent. This proportion increased even more among Democrats, suggesting that assessments of exit polling arguably underestimate that increase. If so, any national poll’s sample of Democratic primary voters today should be about half white and half people of color.
Unless pollsters take that into account, their results will be wrong
Presidential primary polling is important for many reasons. One is that the Democratic National Committee sets a minimum showing in the polls as one criterion candidates must meet to appear in its national debates — and appearing in those debates is how candidates reach a national primary audience. The DNC does not commission any original polls. It relies on measurements by media companies and pollsters. So do these polls accurately reflect the Democratic electorate?
The DNC imposes no methodological criteria on external pollsters when it comes to racial demographics or bilingual polling. It depends on individual media companies and pollsters to “get it right.”
That leaves pollsters with two accuracy problems: surveying enough nonwhites and ensuring that sample represents the full array of people of color. They don’t. For instance, none of the qualifying polls has offered the survey in any Asian languages — even though Asian Americans are expected to make up 7 percent of Democratic voters nationally. And with Latinos making up an estimated 17 percent of Democratic voters, all qualifying polls should be available in Spanish.
That’s important. Academic research suggests offering Spanish and Asian languages results in more accurate and representative voter samples. Consider the fact that in 2010, FiveThirtyEight’s Nate Silver concluded that Nevada polls failed to predict Harry M. Reid’s five-point victory in the Senate race partly because they didn’t offer polling in Spanish and included too few Latinos in their samples.
That’s being repeated. Many 2020 polls are conducted only in English or offer Spanish callbacks rather than immediate options — which results in sampling too few Spanish-speaking voters. As a result, too many of the “mainstream” polls significantly oversample college-educated and higher-income minority voters simply because they’re easier to reach than those with lower socioeconomic status.
Mainstream polls significantly undersample people of color and oversample white people
Using polls from the RealClearPolitics polling aggregator, I compared the benchmark demographics of the Democratic electorate — 51 percent white, 25 percent black, 17 percent Latino, 7 percent Asian American and Pacific Islander (AAPI) — with the demographics in recently released polls. For the benchmarks, I examined the racial composition of the general electorate and the Democratic electorate for the past 12 years, looking at census, exit poll and ANES data. Unfortunately, few pollsters reveal their complete racial demographics. But among those that provide demographic data, every one included too many white voters and too few minorities.
For example, a Fox News poll included a sample of probable Democratic primary voters that was 66 percent white and 34 percent nonwhite. Polls by Economist/YouGov, Emerson and CNN also had samples in which whites were 62 percent or more of likely Democratic primary voters. (These figures are not in the publicly released information, but can be calculated by using this information to identify the number of whites, blacks, Latinos, etc. who plan to vote in a Democratic primary and then estimating the percent of the Democratic primary electorate in each racial group.) In a December Monmouth University poll, 58 percent of Democrats interviewed were white, even with sampling weights applied.
Worse, samples are often so small that results aren’t even presented for black, Latino or AAPI voters — as in a Fox News poll that reports blacks and Latinos as “N/A.”
When data are reported by race, every poll examined finds differences in candidate preference between white and nonwhite voters. Compared with white voters, a higher percentage of black voters support Joe Biden and Sen. Cory Booker (N.J.). Among Latino voters, more support Sen. Bernie Sanders (I-Vt.) and Julián Castro. By contrast, Pete Buttigieg’s supporters are nearly all white.
Here’s why that matters
If we re-weight polls using the Democratic electorate’s expected racial composition, we can estimate how much racial sampling bias skews candidate support.
For example, Booker gets considerably more support from minorities. With Harris out of the race, he is getting about 24 percent of the black vote and 22 percent of the Latino vote in a recent YouGov poll. Because those two groups are undersampled in media polls, I estimate that sampling error probably cost him about 2-3 percentage points in the final poll rankings. That may have kept Booker out of tonight’s debate: He needed 4 percent support to qualify but usually reached about 3 percent in public polls.
Similarly, a recent Texas poll showed Castro commanding 13 percent of the Latino vote and 14 percent of the black vote. Even if Castro didn’t poll as strongly outside his home state, it is still plausible that, like Booker, he lost one to two points in national polls because they undersampled black and Latino Democrats. And he, too, fell just short of appearing in tonight’s debate.
Two states that are more than 90 percent white — Iowa and New Hampshire — wield outsize power in the Democratic primary because they come first. The “qualifying” polls that help narrow the field may be underrepresenting nonwhite voters. This could be preventing candidates of color from participating in the debates — and reducing their chances of breaking to the front of the pack.
Correction: The original version of this post stated that the Democratic primary electorate in 2016 was 45 percent non-white, according to the American National Election Study. This has been corrected to 42 percent non-white, and the projected percent non-white in the 2020 Democratic primary electorate has been revised to 46 percent. The original version also estimated that Cory Booker is currently receiving 10 percent of the black vote and 5 percent of the Latino vote. This has been updated to reflect the percentages in the YouGov poll linked in the piece, which found that Booker is receiving 24 percent of the vote among blacks who intend to vote in a Democratic primary, and 22 percent among Latinos who intend to vote in a Democratic primary. This increases the estimated effect of under-sampling minorities to 2-3 percentage points. The original version stated an estimate of the racial composition of Democratic respondents in a Monmouth University poll (71%) that did not reflect sampling weights. This has been updated to the weighted figure (58%). This updated version also includes more explanation of how the racial composition of likely Democratic primary voters was estimated based on public information from several current polls.