The data on millions of Facebook users that a firm wrongfully swiped from the social network probably has spread to other groups, databases and the dark Web, experts said, making Facebook’s pledge to safeguard its users’ privacy hard to enforce.
But Paul-Olivier Dehaye, a privacy expert and co-founder of PersonalData.IO, said he suspects the data has already proliferated far beyond Cambridge’s reach. “It is the whole nature of this ecosystem,” Dehaye said. “This data travels. And once it has spread, there is no way to get it back.”
Zuckerberg said Facebook will investigate and audit thousands of third-party developers. Third-party apps could access data on Facebook users and their friends until 2015, when Facebook changed its rules. Experts question whether the network’s push to investigate and audit thousands of third-party developers will merit any true results. Dehaye questioned how Facebook would define which apps merit investigation and what would constitute “suspicious activity.”
Facebook said that it conducts manual and automated checks to make sure that developers are complying with its policies. It also plans to expand its bug bounty program to report misuse of data.
Zuckerberg said in interviews Wednesday that the company is investigating reports that independent researchers and dark-web data brokers are trading user data grabbed by the firm Cambridge Analytica.
Frank Pasquale, a professor at the University of Maryland who specializes in algorithms and tech ethics, called this “the runaway data problem,” and said there is no way to return the genie to the bottle when it comes to securing data that has been released. Location and demographic information, which was taken from Facebook, can often be used to tie someone to other data points where the identity was previously unclear.
“The larger [the] data sets you get about individuals, the easier it is to use those to reidentify them in data sets where they think they're anonymous,” Pasquale said. “With a relatively small amount of data points, you can infer an incredible amount of very personal information about people.”
Facebook does not know whether other companies have shared or mishandled user data, and a forensic audit is ongoing, Zuckerberg told Wired magazine. Asked by Wired how confident he was that Facebook data had not gotten into the hands of Russian operatives or other groups, Zuckerberg said, “I can’t really say that. I hope that we will know that more certainly after we do an audit.”
For many of Facebook’s prime growth years, the company gave outside developers access to virtually everything that a user who authorized an app, or her friends, had posted on the social network: her home town, current city, events and location check-ins; her interests, groups and all the pages she’d liked; her relationship statuses with romantic partners, friends and family; her birthday, activities, work history and political and religious affiliations; and her photos, notes and videos.
Facebook changed its rules in 2015 amid concerns over how the data was being used. But for years, other developers had the power to construct the same kinds of massive microtargeted databases that had helped make Facebook so prominent. It’s unclear how many other services used that power or what they have done with the data pulled.
Zuckerberg said the company will “investigate all apps that had access to large amounts of information” before the rule change, a number he said is probably in the thousands. The company, he added, “will conduct a full audit of any app with suspicious activity” and probably will need to hire more workers to complete the audits. “We want to make sure that there aren’t other Cambridge Analyticas out there,” he told Wired.
The data shared with Cambridge Analytica was taken via a personality quiz, called “ThisIsYourDigitalLife,” that was initially approved by Facebook for research purposes.
It’s unclear how Facebook would know how to find or recover users’ data. The data taken by the researcher Aleksandr Kogan, who provided it to Cambridge Analytica, “wasn’t watermarked in any way,” Zuckerberg told Wired. “And if he passed along data to Cambridge Analytica that was some kind of derivative data based on personality scores or something, we wouldn’t have known that, or ever seen that data.”
In the same year that Facebook severed ties with him, Kogan also started his own San Francisco-based survey data firm, Philometrics, raising questions about whether he took the Facebook data with him and used it for commercial purposes. (Kogan did not reply to repeated requests for comment.)
Apps and start-ups that grabbed user data over a number of years, Dehaye said, often hand over their data if they’re acquired by another company or sell their data if they close or liquidate.
Facebook opened the door to developers in 2007 in hopes of expanding its reach across the Web by making it easier for other sites to connect with the sprawling connective maps the network uses to link people by relationships and tastes, known as its “social graph” and “interest graph.”
Marketing firms have spent tens of millions of dollars to learn similar information — including compiling consumer surveys and purchasing massive consumer files from data brokers such as Experian and Acxiom — all of which came from different sources and had varying ages, precision and usefulness. Facebook’s wealth of data, on the other hand, was packed with detailed information volunteered by users themselves and offered completely free until the rule change took effect in 2015.
Facebook, Zuckerberg said, will now restrict the data that third-party developers can access to names, profile photos and email addresses, and will require developers to sign a contract before being allowed to ask users for rights to their posts.
Facebook said it will ban developers who misuse its data.
The sheer size of the data pulled from Facebook, experts say, is powerful on its own — and could prove valuable for marketers, political campaigns or other groups seeking to target users en masse.
“Getting good data on 50 million people from a relatively neutral, nonpartisan source that is diversely spread, and not just clustered in one tiny segment of the social graph — that’s a big deal,” said Matthew Hindman, a George Washington University associate professor who researches online campaigning and Internet politics. “If you can see that many people's activity on Facebook, you can guess pretty accurately what their partisanship might be, no matter how good your model is.”