The article implied that agents of the United States government could have foreseen something like the San Bernardino attack by paying better attention to the social media warning signs. Carly Fiorina, seizing on this news item in the Republican presidential debates, claimed that we failed to prevent the Boston Marathon attacks because “we were using the wrong algorithms.”
But writing computer programs to scrape social media and identify dangerous applicants will not be easy, cheap, or straightforward. Here’s why.
Even in Ukraine’s crisis, its social media chatter was almost entirely about traffic, celebrities or weather
First, the vast majority of activity on social media is apolitical, even during periods of violent political upheaval. In a study of competing political narratives in Ukraine around the Euromaidan events and the seizure of Crimea, we searched through 5 million geo-tagged tweets originating from within the territory of Ukraine during months of existential political crisis. Almost all were about traffic, celebrities or the weather. Discovering whether a visa applicant has ever voiced suspect opinions will require searching through acres of haystacks in the hopes of finding a few needles.
Hate speech is common, and terrorism is exceedingly rare
Second, identifying suspicious social media activity cannot be conclusive without additional labor. Whittling hundreds of thousands of flagged accounts down to a manageable watchlist will be an expensive and time-consuming human effort, not the work of algorithms.
The real behaviors that state agents are interested in predicting are actual propaganda-by-deed terrorist acts. But hate speech is commonplace, and political terrorism is exceedingly rare. Every time a keyword is used that does not lead to terrorist actions, it’s part of a vast amount of noise obscuring a miniscule signal.
How agents of state security bureaucracies are supposed to sort the insurrectionist wheat from the noisy chaff is not clear. Our experience with fuzzy-string searches suggests that the first cut of answers will involve profiling. Muslim teens on Facebook “liking” the wrong kind of hip-hop are sure to end up on lists.
But the problem of ubiquitous false-positives remains. Thousands of angry people use the Internet to proudly declare their support for domestic terrorism every single day. Since everyone understands that most of it is just cheap talk, it is protected speech.
Retweeted hate speech might be mockery, bots, or news accounts
In Ukraine, it was not difficult for us to identify Russian-language tweets containing divisive keywords (fascist, Nazi, terrorist, etc.). But laborious manual review then revealed that a great deal of the hate speech was generated by either media professionals (news accounts, bots, journalists, academics) or due to ironic “hate-link” retweets (individuals mocking opponents’ arguments).
Even though a war was breaking out, and a lot of young men were joining up, we doubt our ability to make predictions about who in our Ukraine sample was most likely to join a militia or put on an explosive vest with our data.
Selfies posed with guns are not all that informative compared with the kinds of evidence that emerges from sustained surveillance and human intelligence. The “doers” may be systematically different than the “talkers.” The “doers” are probably less likely to have Twitter and Instagram accounts in the first place, and, when they do, be more likely to know how to adjust their phones’ privacy settings and turn-off geo-location.
But maybe that is just because our small lab lacked sufficient resources? If we could obtain vast records of social media behavior on everyone, and had techniques to search through them quickly, perhaps our ability to make fine-grained predictions of this sort would improve? Perhaps.
Trying to make sense of all this data has tremendous costs
But there are costs to pushing in this direction. Many companies – including Dataminr, Gnip (now part of Twitter), Sifter, and Crimson Hexagon – already offer a range of services to commercial and governmental clients based on access to complete data from multiple social media sources.
Transforming these data streams into something actionable requires adding metadata (such as social connections or demographics) to screen visa applications. Unless social media companies provide raw data or suspend rate limits for governments, adding this metadata in real-time on a large enough scale to retroactively screen millions of visa applicants is a formidable engineering challenge. And social media companies are hesitant to take either of these actions.
We know such efforts are technologically feasible. We would be genuinely shocked if already many subcontractors were not inking multi-year funding proposals.
But the costs can be measured in tax dollars, diminished civil liberties, and eroded expectations of privacy. And it will signal to billions across the world that the United States government is reading over their shoulders. This is a very damaging conspiracy theory that hurts American public diplomacy efforts abroad.
Terrorists will figure out how to evade detection
Finally, the dangerous terrorists in the future may be systematically smarter and more sophisticated about evading search algorithms than they have been in the past.
Yes, the Boston Marathon bombers were hiding in plain sight, but it will not be like that every time. Imagine a regime in which every visa applicant was required to provide his or her social media screen names. This is not synonymous with giving the government access to an applicant’s social media history. An applicant could easily provide a false screen name or keep multiple screen names and identities — some encrypted, some not — and just not report all of them.
If “clean accounts” become a valued commodity, it is easy to foresee a secondary market in the creation and sale of banal social media histories. (“$30,000 buys you “email@example.com, including a 8-year backlog of NBA score searches with no tagged words.”)
The perpetrators of the San Bernardino shootings certainly worried they were being watched and took steps to avoid detection. CNN first reported that Malik used a pseudonym and enabled privacy settings that ensured very few people could see her messages.
Even more recently it has come to light that the messages were “private direct messages” — not social media posts at all. The government could obtain private data by working cooperatively with social media and telecommunication companies, though at the present moment the major players seem to be reluctant to turn over data at the scale required. (And for dedicated civil libertarians, the intimation that political correctness has limited the ability of the national security infrastructure of the United States to tap phones and read emails remains absurd on face).
The government could force companies to comply, of course, in the name of stopping terrorists at the border. But is the juice actually worth the squeeze?
Real security requires police work
On the one hand, calls for preventive action against terrorists are motivated by a valid fear. On the other hand, a policy shift toward using the digital archive of social media to conduct real-time background checks as a normal part of the visa application process could easily backfire. Adaptive behaviors, like getting off the Internet entirely, are probably more likely for the real terrorist threats than for all of the noisy false-positives. These adaptations are more likely the longer a policy shift is debated in the public sphere.
But the larger point is that counterterrorism is a cat and mouse game involving very sophisticated players that traffic in secrecy.
But our strong hunch is that even when the United States government has total access to all kinds of social media data, and the luxury of a great deal of time and computational power to sort through it all, it probably will not matter.
Genuine security measures will still require a great deal of multilingual police and intelligence work, as well as government and inter-agency cooperation, all over the planet.
Zachary C. Steinert-Threlkeld (@ZacharyST) is a PhD candidate in the department of political science at the University of California at San Diego.