Georgia Secretary of State Brian Kemp, a Republican who is running for governor against Democrat Stacey Abrams, has put on hold more than 53,000 voters so far, given mismatches in the names in their voting records and other sources of identification such as driver’s licenses and Social Security cards. If the measure takes effect, voters whose information does not exactly match across sources will need to bring a valid photo ID to the polls on Election Day to vote. That could suppress voter turnout, either because some voters lack IDs or because voters are confused about whether they are eligible. Proponents of the rule assert that it is only meant to prevent illegal voting.
But is missing a hyphen, an initial instead of a complete middle name, or just having a discrepancy in one letter in a voter’s name good evidence that the voter is not who they say they are? How would we know?
Researchers often need to match records — and they have to get it right
As it happens, researchers often ask that question. In doing empirical scientific research, they often need to link various sets of data by some imperfect identifier — say, agency names or individual addresses. While doing this can be tedious, getting the matches correct is crucial. Match the wrong records, and any analysis may be totally unreliable. That leads many data analysts to only retain exact matches.
But although incorrect matches can cause problems, so can dropping records that should be matched but have small discrepancies. Eliminating those records can also corrupt an analysis.
That’s why I have spent the past three years helping to develop an algorithm that uses probabilistic record linkage called “fastLink” that not only makes record linkage across data sets speedy and automated, but also tells the analyst how likely it is that an inexact match of two records is actually correct.
In a recent study co-authored with colleagues Ben Fifield and Kosuke Imai, we apply the algorithm to the question of voter identification. The results raise serious concerns about Georgia’s exact match law — and its likelihood of preventing tens of thousands of valid voters from casting ballots.
Here’s how we did our research
We worked on linking two nationwide voter files from 2014 and 2015 collected by L2 Inc, a national nonpartisan firm that supplies voter data and related technology for campaigns. All active voters in 2014 appeared in the 2015 data set — meaning that we knew a true match always existed. But many records had typographical discrepancies preventing exact matches.
Our analysis found that the “exact match’’ approach would link only 66 percent of voters who were actually the same, correctly identifying about 91 million voters. In other words, “exact matching” would exclude nearly 40 million records that actually did refer to the same voter — disenfranchising quite a few Americans.
What does this mean for Georgia’s voters?
Georgia’s records had a higher proportion of exact matches than we found nationwide — but 30 percent of actual voters still failed to exactly match in that state.
By contrast, using our algorithm, which correlates with L2’s in-house matching records nearly perfectly (r=.99), we are able to match almost 127 million registered voters — or 93 percent of all voters in the 2014 data. Among those whose records did not exactly match, we found that 25 percent have at least a 99 percent probability of being correct matches, while 28 percent have at least a 95 percent probability.
As an illustration, using our algorithm, 91 percent of those on Georgia’s voter rolls would be cleared to vote, or 3,941,342 voting-eligible citizens — while “exact matching” clears only 70 percent, or 3,031,802 eligible citizens.
I also attempted to link the voters in the 2016 American National Election Study (ANES) with voter records contained in the L2 data using two methods: exact matching, and an improved version of fastLink I recently developed.
The results appear in the chart below. As you can see, the “exact matching” method misses a substantial share of valid matches. While our algorithm validated 60 percent of the voter records, “exact matching” validated less than 30 percent, on average.
And in keeping with the concerns of opponents of the Georgia measure, nonwhite voters are especially likely to be harmed. The match rates using exact matching are nine and six percentage points lower for black and Hispanic voters, respectively, than for white voters.
Georgia’s “exact match” law is the latest in a string of voter identification measures that critics allege are thinly veiled voter suppression tactics. Whether intended that way or not, Georgia’s “exact match” rule will disproportionately affect minority voters.
Editor’s note: This post was edited to correct the name of Georgia Secretary of State Brian Kemp, and to clarify that voters may be prevented from voting if no additional checks are undertaken.
Ted Enamorado (@TedEnamorado) is a PhD candidate in the politics department at Princeton University.