The Washington PostDemocracy Dies in Darkness

Facebook hides data showing it harms users. Outside scholars need access.

The social media company has lost its right to secrecy

Former Facebook product manager Frances Haugen appears on “60 Minutes.” Haugen has been revealed as the source behind tens of thousands of pages of leaked internal company research. (Photo by Robert Fortunato/CBSNews/60MINUTES)

The disclosures by whistleblower Frances Haugen about Facebook — first to the Wall Street Journal and then to “60 Minutes” and Congress — ought to be the stuff of shareholders’ nightmares: When she left Facebook, she took with her documents showing, for example, that the corporation knew Instagram was making girls’ body-image issues worse, that internal investigators knew a Mexican drug cartel was using the platform to recruit hit men and that the company misled its own Oversight Board about having a separate content appeals process for a large number of influential users.

Facebook may be too big for the revelations to hurt its market position, a sign that it may be long past time for the government to step in and regulate the social media company. (Facebook’s stock has dropped about 4 percent since Oct. 1.) But for policymakers to effectively regulate Facebook — as well as Google, Twitter, TikTok and other Internet companies — they need to understand what is actually happening on the platforms.

Whether the problem is disinformation, hate speech, teenagers’ depression or content that encourages violent insurrection, governments cannot institute sound policies if they do not know the character and scale of what’s going on. Unfortunately, only the platforms have access to the relevant data, and as the newest revelations suggest, they have strong incentives not to make their internal research available to the public. Independent research on how people use social media platforms is clearly essential.

After years of frustration — frustration also felt by many Facebook employees trying to do the right thing — I resigned last year as co-chair of an outside effort to try to get the company to share more data with researchers. Facebook’s claims of privacy dangers and fears about another Cambridge Analytica scandal significantly hindered our work. (Facebook suspended data analytics firm Cambridge Analytica in 2018 for having improperly harvested and saved data about Facebook users; the federal government fined the company $5 billion in 2019 for failing to protect users’ privacy in that case and others.)

Facebook gave researchers access to some data about two years ago through this program. Several academics spent hundreds of hours mining it. But we learned this summer, after some had published their findings, that the dataset had significant errors, which only this week were corrected.

So we are now at a standstill: The public does not trust the research and data Facebook releases, and Facebook says existing law, including the Cambridge Analytica settlement, prevents it from sharing useful data with independent researchers. Congress could solve this problem by passing a law granting scholars from outside the social media companies access to the information the platforms hold — while protecting user privacy. I have drafted text for a law along these lines, which I call the Platform Transparency and Accountability Act.

Silicon Valley wants to develop mind-reading tech. We need to regulate it first.

Some models exist for analogous research on sensitive government databases, such as those overseen by the Census Bureau, Internal Revenue Service or Defense Department; and protocols exist, too, for studying biomedical and other highly personal data. But getting access to Facebook and Google’s data represents a challenge that is different in kind and degree. It’s not much of an exaggeration to say that almost all of human experience is now taking place on these platforms, which control intimate communications between individuals and possess voluminous information about what users read, forward, “like” and purchase.

Several ingredients seem important to insuring the success of a new data-access regime for independent researchers. First, a government agency — most likely the Federal Trade Commission, which already investigates issues of online fraud and privacy violations — would have to be vested with sufficient power to police researchers’ behavior, as well as ensure platforms’ cooperation with projects the agency approves.

Second, the government itself should not have access to the data. The risk of surveillance and mission creep from law enforcement is simply too great. The data must stay within the firm’s control, but the FTC should specify in detail the procedures for accessing data and the requirements for facilities at firms — “clean rooms” — where outside researchers will analyze it. These would likely include recording every keystroke made by a researcher while accessing the data and vetting any potential publications to ensure no leaking of private information.

Third, the firm should have no power to decide which researchers will have access. That is for the FTC to approve. Toward that end, the agency should work with the National Science Foundation to develop rules, procedures and applications governing which researchers get the nod. Who counts as a researcher? It makes sense to start with scholars at universities, because universities have Institutional Review Boards to prevent ethics violations, and universities can be signatories to the relevant data access agreements. If it proves possible to legally define who counts as a “legitimate” journalist or think-tank scholar, perhaps access could eventually be expanded beyond professors.

We’re better off without Trump on Twitter. And worse off with Twitter in charge

Critics of the Silicon Valley companies often describe them as monopolies, referring to their scale and their power over the markets they (theoretically) compete in. But the most recent Facebook revelations underscore that they are also data monopolies: They have exclusive access to the information needed to understand the most pressing challenges to society.

The current situation — platforms controlling all their data and deciding what information the public deserves to know — is unsustainable. So, too, would it be a mistake for Congress to regulate the Internet based on folk theories or misguided conventional wisdom regarding the harms caused by these new technologies. We need good information.

As divided as Republicans and Democrats may be on exactly how these companies should be regulated, members of both parties should be able to come to together to break these firms’ stranglehold on the information necessary for sound technology policy.