Though it has been marketed as a one-stop warrantless law enforcement tool, Clearview’s client list is also reported to include casinos, gyms, supermarkets, sporting leagues and wealthy parents curious about their kids’ dates. The upshot? The fundamental comfort — and liberty — of being able to walk down a street or enter a supermarket or stadium without the authorities, or fellow strangers, immediately knowing who you are is about to evaporate without any public debate about whether that’s okay. It’s as if someone invented glasses that could see through walls, sold them to a select few, and everyone else inexplicably shrugged.
Now, the Wall Street Journal reports that Clearview AI is “in discussions with state agencies about using its technology to track patients infected by the coronavirus, according to people familiar with the matter.” It’s a savvy move, aimed at turning a rogue actor into a hero. This is a delicate time for privacy advocacy, which can feel less important during a public health crisis. Amid the upheaval, we must treat the company’s behavior as an opportunity to think about the responsibility of the online platforms from which it took its images. We should be doing that now, in the heat of this crisis, rather than waiting until another privacy-intruding start-up or two have completed their ravages. If we fail to act, today’s intrusions will cement themselves as tomorrow’s status quo.
The company’s services don’t represent a technological breakthrough as much as norm-shattering daring. Clearview simply added water to a recipe that no one else thought advisable to make, using existing ingredients. For example, Google has long offered a free “reverse image” search, making it easy to determine if a given photo is already online somewhere. State driver’s license databases could provide ample fodder for facial recognition tools — but they are protected by law from casual use. If online images were gathered indiscriminately, there would be a replication (and then some) of the usefulness of a driver’s license database. Then, with facial recognition in play, photo search need not be looking for identical photos to a sample, but rather identical people to a sample, however different other photos of them may be. As platforms such as Facebook and LinkedIn have grown, they have offered a vast buffet of user-provided images — most associated with names and personal information, particularly in the case of sites like Facebook that ban pseudonymity — to anyone with the technical know-how needed to scrape them.
And that’s exactly what Clearview AI did, harvesting more than 3 billion images from a wide range of sources. Collecting photos and identifying data in bulk from around the web, especially from the largest organized troves of them, is against most platforms’ terms of service. Were any big company to do it, it would likely result in years of litigation. The norms against such collection are so strong that many researchers wanting to scrape data from websites only for their own analyses and in the public interest are chary of doing it.
Clearview took a chance anyway. It’s selling what it took online to whoever can pay, including those who already wield the power of the state, except here unconstrained even by such guardrails as warrants and subpoenas. The platforms must shoulder some of the blame. Without the likes of Facebook, YouTube and Twitter, there could be no Clearview AI: They appear to have been eminently scrapable, asleep at the switch as the looters returned again and again. Indeed, the more apt analogy might be to a toxic waste site whose guard looked bemusedly on while an “entrepreneur” made repeated trips to haul away thousands of barrels.
Now that the shoestring caper has been executed, it’s reported to have been later funded by venture capitalists — including a member of Facebook’s board of directors.
Clearview AI is exploiting a long-standing vulnerability in the architecture of the platform economy: Data that we might comfortably make public individually can, in aggregate, lead to the equivalent of a police state. The world is better when people can, say, look you up by name if you like, but not look you up by face after simply glimpsing you somewhere, whether in person or by happenstance, in a photo or video. And privacy is increasingly collective: Once available data reaches a critical mass, the emergence of services like those offered by Clearview AI becomes almost inevitable.
Given their apparent, if accidental, involvement in these developments, it’s stunning that the platforms haven’t reacted strongly to the Clearview AI revelations. Many apparent sources for Clearview AI’s scraping campaign, including Facebook, Google and Twitter, have, billions of horses out of the barn later, sent the start-up standard cease-and-desist letters over violations of their terms of service. But they have been notably muted in their public responses. In the days following the disclosure of the scrape, for example, Facebook released a wan statement: “Scraping people’s information violates our policies, which is why we’ve demanded that Clearview stop accessing or using information from Facebook or Instagram.” The public has heard little else of substance from the company or its peers about it since, including any accounting of just how much information might have been exfiltrated. That lack of information has severely limited the specificity of what might otherwise have been productive public discussions.
(When asked for any further comment, a Facebook spokesperson replied with a stronger statement: “Clearview AI’s actions invade people’s privacy which is why we banned their founder from our services and sent them a legal demand to stop accessing any data, photos, or videos from our services.” Google and Twitter did not offer further comments on the record.)
Perhaps more surprisingly, the platforms have caught little flak for their casual reactions. Why should the public trust social media platforms with personal data when they seemingly can’t protect it from wholesale abuse by one or two people? To be sure, legal action on the part of Silicon Valley behemoths may not be a slam dunk: The Ninth Circuit’s holding in the 2019 case hiQ v. LinkedIn, on very different facts, might cast some doubt on some of the most obvious arguments that would be made in litigation against those people and entities that scrape publicly available data. But the lack of a swift and forceful legal response — or even meaningful public utterances — whether from private companies or most public authorities, is unforgivable, given the level of trust that today’s social media platforms demand by their very design.
The platforms have been more than willing to deploy armies of lawyers to other ends — from grappling with the FTC to massive IP litigation. So why not when the interests of users are on the line? It wouldn’t be at all unreasonable to ask why the platforms won’t fight for us like they fight for reduced fines, or control over patents. If ever there was a time to be litigious, it’s now.
The largely unimpeded success of Clearview AI isn’t just a civil liberties disaster of seismic proportions — it vindicates a strategy of devastating hit-and-run attacks on our personal liberties by start-ups, underwritten by the silence of some of the world’s wealthiest and most powerful companies. Our best chance now might be aggressive litigation or broader regulation, whether inspired by Silicon Valley, forward-looking government agencies and state attorneys general (Vermont is currently leading the charge), or private citizens. Perhaps the typically more assertive European data privacy enforcers, largely silent to this point, will intercede, particularly if Clearview AI moves into the E.U. in earnest. Indeed, Clearview has already offered up opt-out forms only for California and E.U. residents, in response to the more plentiful privacy protections offered to citizens of those jurisdictions.
Looking forward, we are long overdue for a reckoning with what good custodianship of public data entails. Part of that conversation should revolve around new and better data privacy laws, and part of it should revolve around platform practices. To start, the platforms need to be clearer with users around when and how their data becomes scrapable, and therefore usable by other entities — even when that scraping is against a terms of service, given that such terms clearly aren’t preventive. They’ll also need to flex their technical muscles alongside their legal ones, exploring new safeguards for what users offer publicly in the expectation of practical obscurity, to prevent tomorrow’s Clearview AIs from pulling off the same stunt.
In the case of Clearview AI, the time to implement these protective measures would have been months or years ago. But, should platforms and policymakers fail to take the fissures exposed by this round of scandal seriously, what we’re seeing will only be the start. We may soon find ourselves on the brink of a dark period of innovation in digital technology, one in which the missteps and carelessness of the platforms will pale in comparison to those of the ever-more-exploitative dot-coms they feed.