Facebook provided a data set to a consortium of social scientists last year that had serious errors, affecting the findings in an unknown number of academic papers, the company acknowledged Friday.
The data concerns the effect of social media on elections and democracy and includes what web addresses Facebook users click on, along with other information.
The error resulted from Facebook accidentally excluding data from U.S. users who had no detectable political leanings — a group that amounted to roughly half of all of Facebook’s users in the United States. Data from users in other countries was not affected.
“It’s data. Of course, there are errors,” said Gary King, a Harvard professor who co-chairs Social Science One. “This, of course, was a big error.”
King, director of the university’s Institute for Quantitative Social Science, said dozens of papers from researchers affiliated with Social Science One had relied on the data since Facebook shared the flawed set in February 2020, but he said the impact could be determined only after Facebook provided corrected data that could be reanalyzed. He said some of the errors may cause little or no problems, but others could be serious.
Social Science One shared the flawed data with at least 110 researchers, King said.
The group’s former co-chairman, Stanford Law professor Nathaniel Persily, said of the incident: “This is a friggin’ outrage and a fundamental breach of promises Facebook made to the research community. It also demonstrates why we need government regulation to force social media companies to develop secure data sharing programs with outside independent researchers.”
An Italian researcher, Fabio Giglietto, discovered data anomalies last month and brought them to Facebook’s attention. The company contacted researchers in recent days with news that they had failed to include roughly half of its U.S. users — a group that likely is less politically polarized than Facebook’s overall user base. The New York Times first reported Facebook’s error.
“This issue was caused by a technical error in our URL Shares Data Set, which we proactively told impacted partners about and are working swiftly to resolve,” Facebook spokeswoman Mavis Jones said.
The anonymized data set is one of the largest in social science history, with 42 trillion numbers. The set includes protections against individual users being identified based on what they have posted on Facebook, King said. He said the company began working more closely with researchers after the Cambridge Analytica scandal in 2018, but there have been tensions with researchers over how much information is shared by the company, which often cites privacy concerns when not providing data with the granularity they desire for their work.
Cody Buntain, a member of the consortium and an assistant professor of informatics for the New Jersey Institute of Technology, said richer data from Facebook would have allowed researchers to discover the error sooner through routine checks. He said he was directly aware of several papers whose data now would need reanalyzing. There’s no immediate timetable for Facebook to provide the corrected data, which is so large that it typically takes weeks to process.
“This is a totally foreseeable, preventable problem,” Buntain said.