With the reopened FBI inquiry into Hillary Clinton’s emails when she was secretary of state, Americans are once again considering what counts as negligence in how officials — and in particular, Clinton — handle potentially sensitive information.
A recent Gallup poll found that the email controversy has dominated what voters hear about Clinton. A majority of those whom Rasmussen surveyed two weeks ago (53 percent) still think Clinton should have been indicted.
Was Clinton guilty of “gross negligence” in handling state secrets? Critics assert that allowing classified information on an unclassified system “is unheard of and a major criminal offense,” indictable under the Espionage Act. Defenders say that what gets defined as secret is “almost random,” and overclassification has run amok.
But what if we examined the question using data science, the discipline dedicated to understanding how data can be stored, classified, analyzed and protected?
At Columbia University’s History Lab, social scientists and data scientists have conducted many experiments to discover patterns and anomalies in official secrecy in large collections of declassified documents. We joined with collaborators at Fundação Getulio Vargas in Brazil, Renato Souza and Flavio Coelho, to see whether we could use data science methods to classify State Department communications.
We had two goals: First, find out whether, and to what extent, being classified as “secret” or “confidential” has historically been random or predictable. Second, learn what is normal and what might be considered negligent in how officials manage large numbers of potentially sensitive communications.
Here’s how we did it. Through machine-learning, a type of artificial intelligence, we create algorithms to measure and compare features in a data set that is already classified. In this case, the data consists of State Department communications, and the classes are secret, confidential and unclassified. High-performance computers systematically sort out what tends to differentiate these communications, whether that’s by subject matter, senders and receivers, or words in the message.
After “training” the algorithm in this way, we test it on a different set of communications. If any patterns distinguish more vs. less sensitive communications, the algorithm should be able to automatically rank those most likely to be classified as secret.
That’s essentially what Clinton and her aides were doing when deciding what they could send via email and what they needed to keep on secure systems. If AI methods prove reliable, they could be used to create a “recommender system” that would assist officials in classifying a message before it is sent.
We tested our method on declassified records from the 1970s
Of course, we couldn’t test our AI methods against Clinton’s communications, because we could not examine email that reviewers later classified, or the messages she and her team identified as sensitive and sent via a secure method. So we used our approach on declassified data from the 1970s, when the State Department first started to use electronic records. For this purpose, we acquired millions of State Department records from the National Archives.
Here’s what we found.
Does the State Department always protect classified communications on secure systems?
We found almost 48,000 “Secret” and “Confidential” cables that contain nothing but error messages — or just nothing. The National Archives states that these losses occurred when the cables were still at the State Department, whether because of technical problems or deliberate deletion. “Secret” cables were more than three times as likely as unclassified messages to go missing.
That left almost a million diplomatic cables from 1973 to 1978 with full text and many kinds of metadata. After extensive testing, we developed a matrix with 40,700 possible features for each of the 918,083 records.
How did our AI tool do at predicting the proper classification?
By calculating the relative frequency of different words in the message text and in the metadata, the algorithm could correctly identify 90 percent of the cables marked as “secret,” “confidential” or “limited official use.”
Just the keywords officials assigned to each cable made it possible to identify 84 percent of the classified communications. Such sensitive communications typically involved the most senior officials discussing subjects such as arms-control negotiations (as opposed to, say, civil aviation or scientific meetings).
In our experiment, fewer than 11 percent of the cables that our algorithm identified as likely to be classified had not been categorized as classified. But a lot of these resulted from human error. This included cables originally classified as secret when received at the State Department but were resent to another post as unclassified, such as what Lebanese Christian leaders said about cease-fire negotiations with the Palestine Liberation Organization.
There were also many unclassified cables that, according to experts with security clearances, would have been highly sensitive at the time. This included, for instance, what a confidential informant told U.S. diplomats about the kidnapping of the son of the president of Cyprus.
Meanwhile, about 16 percent of the cables the algorithm identified as unclassified were actually marked as secret, confidential or limited official use. But here again, inspecting these cables indicated a lot of human error, such as a mismatch between the metadata from the State Department database and the actual markings on the cables. Hundreds of cables had been mislabeled as unclassified, such as a report on Japanese government sensitivity about U.S. inspection of its nuclear facilities, in which the message text itself clearly showed it was meant to be confidential. Other cables that were labeled “secret” were almost certainly overclassified, such as miscellaneous travel reservations.
To be sure, apparent error can reflect disagreements among officials about what they need to classify, a perennial problem. A 2008 interagency report for the Office of the Director of National Intelligence found that there was “uneven guidance, misunderstanding, and a lack of trust between Intelligence Community agencies and mission partners concerning the proper handling and protection of information.”
Was Clinton negligent?
So was Clinton’s team negligent when they sent emails on an insecure system that other officials later deemed to be confidential, secret or top secret?
Clinton and her aides were probably as qualified to identify sensitive information as anyone else. But if we accept these post-hoc reviewers were right in every case, that amounts to 2,115 misclassified emails, or about one or two emails each day Clinton’s team was using the system.
Some think this is a lot. But it is less than four percent of the 54,149 individual messages (the oft-cited figure of 30,000 emails is actually the number of email threads, many of which contain multiple messages).
So Clinton and her team generated about 37 emails a day, and one or two of them should have been classified. But they were also generating an untold number of communications using secure systems, including cables, secure telephone calls, and secure fax messages.
Let’s say, conservatively, that the Secretary of State had no more communications than an average office worker – 121 e-mails daily, not counting phone calls, snail mail, and so on. Out of that, would under-classifying one or two per day be a high error rate? It’s better than we were able to achieve with “big data” and high-performance computers.
What’s the State Department’s usual error rate in classifying communications?
Clinton can’t be considered negligent until we know how her record compares with the error rate for the rest of the State Department. Clearly, officials make errors in identifying sensitive information. But even though the government spends more than $16 billion a year guarding official secrets — almost 40 times more than it allocates to answer (or not answer) Freedom of Information Act requests — it has never studied to what extent officials agree on what they should keep secret, and how reliably they protect these secrets. Without that kind of research, we simply cannot know whether Clinton was better or worse than average in recognizing sensitive information and protecting it on a secure system.