The precision and volume of the information, including dozens of data points on individual Republicans, Democrats and independent voters, highlights the rising sophistication of the data-mining efforts that have become central to modern political campaigns.
In some cases, that included which voters are suspicious of Wall Street and pharmaceutical firms, or who reluctantly voted for Hillary Clinton or supports the Affordable Care Act, Vickery said.
“They’re using this information to create political dossiers on individuals that are now available for anyone,” said Jeffrey Chester, executive director of the Center for Digital Democracy. “These political data firms might as well be working for the Russians.”
The data found by Vickery, who studies cybersecurity risk for the Silicon Valley start-up UpGuard, was compiled by GOP political consultant Deep Root Analytics, based on voter lists maintained by the RNC and augmented by other sources.
Deep Root did not disclose those sources, but political research firms for years have been collecting information on voters from data brokers, social media postings, polling and other contacts with voters.
The company also kept information on Americans’ voting histories and their reported enthusiasm for Trump, Vickery said. Some of the files assigned voters a score based on their views of 46 different issues ranging from immigration to trade. Nearly 170 gigabytes of the exposed data consisted of social media posts scraped from Reddit, he added.
Among the data are unique RNC identifiers for each voter, Vickery said. The files also potentially offered insight into party strategy for tracking and organizing voters.
“What is alarming about this now is that I believe it’s the first time RNC IDs and model data have been exposed,” said Matt Oszcowski, a veteran GOP political data strategist who recently started his own political fundraising company, Campaign Inbox. “This is not just a list of people; this is unique proprietary information which gives away [Republican] strategy and informs on targeting and methodology.”
The files do not appear to include Social Security or credit card information, as has leaked in some major commercial data breaches. Nor is it clear if anyone other than Vickery gained unauthorized access to the files during the two weeks they were left without a password or other security before the problem was discovered on June 12.
But malicious hackers routinely conduct such scans of the Internet looking for unprotected files they can exploit. And to those who may have found them, the files painted a detailed portrait of virtually all of America’s roughly 200 million voters — revealing their names, addresses, birth dates and phone numbers. The information was being stored by Amazon Web Services.
The voter files found by Vickery, he said, added up to “billions of data points” that, in the wrong hands, could easily be abused.
“With this data you can target neighborhoods, individuals, people of all sorts of persuasions,” said Vickery in an interview. “I could give you the home address of every person the RNC believes voted for Trump.”
In a statement, Deep Root blamed the lapse in security on a settings change, and said it had hired an outside firm to conduct an independent investigation. “We accept full responsibility, will continue with our investigation, and based on the information we have gathered thus far, we do not believe that our systems have been hacked,” Deep Root said.
Deep Root co-founder Alex Lundry said the data, which included proprietary information as well as publicly available voter data provided by state government officials, has been secure since new protocols were put into place on June 14. The exposure began on June 1, when Deep Roots Analytics adopted updates that accidentally stripped away the password protections on the files.
“The RNC has halted any further work with the company pending the conclusion of their investigation into security procedures,” the RNC said in a statement. “While Deep Root has confirmed the information accessed did not contain any proprietary RNC information, the RNC takes the security of voter information very seriously and we require vendors to do the same.”
Amazon Web Services declined to comment about the security problem. (Amazon.com founder Jeffrey P. Bezos owns The Washington Post.)
The RNC poured more than $20 million into data services in the 2016 cycle, according to Federal Election Commission records. Of that, $6.2 million went to Data Trust, a data management firm that has an exclusive list-sharing agreement with the national party.
That allows the company to swap RNC voter data with independent big-money groups such as American Crossroads, American Action Network and the Koch political network, helping grow the party’s master voter file.
For its part, Deep Root Analytics worked for at least 14 GOP political committees in the 2016 cycle, FEC records show. Among its clients were the campaign committee of House Speaker Paul D. Ryan (R-Wis.) and his allied House super PAC; the Senate Leadership Fund, a super PAC aligned with American Crossroads and Senate Majority Leader Mitch McConnell (R-Ky.); and former Florida governor Jeb Bush’s presidential campaign and allied super PAC.
There are no reported payments from the RNC to Deep Root. However, the party spent $983,000 on “polling services/consulting” with a company called Needle Drop, which is a subsidiary of Deep Root, according to AdvertisingAge.
Both parties, as well as independent political groups, have been increasing their data-collection efforts for several campaign cycles. Privacy experts have warned for years that this has happened with little oversight from federal or state officials.
“Perhaps the biggest privacy problem here is the fact that the Republicans have all this information about voters in the first place,” said Peter Eckersley, chief computer scientist for the Electronic Frontier Foundation, a civil liberties group. “At some point in the past, parties picked a platform and voters decided on it. But with these databases, political operations can promise very different and increasingly contradictory things to different people, and that may be turning into a serious problem for democracy.”
Correction: An earlier version of this story incorrectly stated that the database that was vulnerable to theft belonged to the Republican National Committee. In fact, the data came from the RNC and other sources and was assembled by Deep Root.