This is the fifth contribution in our mini–symposium on what policymakers can learn from recent academic research into nuclear weapons. Matthew Connelly is a professor of history at Columbia University. He examines how computational advances can help historians make sense of a treasure trove of new data.
But are historians being a little atavistic when we respond to studies based on 52 or 210 cage matches by focusing on the one we happen to know about? In fact, this tendency, however annoying, is the reason why social scientists tend to trust our work. We are committed to “getting it right,” even if it requires interviewing dozens of participants and poring over thousands documents. Like Frank Gavin, I believe reconstructing complex historical events is excellent training for future policymakers and citizens. Is it not more practical than learning how to code wars on a scale of one to five, or making policy choices based on whether this or that correlation meets some arbitrary test of statistical significance?
To help students understand why good historical scholarship takes time, and why it is worth it, I sometimes ask how many have witnessed some event that made the news. With many hands up, I then ask who among them thought the reporter got the story completely right. Invariably, the hands sink and smiles break out as students consider how much is wrong not just in that one story, but many other stories in the media. If you asked a room full of diplomatic historians the same question about “large-n” research, they would have a similar reaction: the coding of cases they know best is crude, if it is not false, and the causal mechanisms have little or no basis in the historical record. Rather than feeling reassured that the errors are randomly distributed, it leads us to doubt the whole enterprise.
But the political scientists in Gavin’s cage match ask good, tough questions about whether historians can be confident that they have “gotten it right,” and why it is wrong to try to develop methods that might eliminate alternative explanations. I share many of their concerns about the opacity and non-reproducibility of historical research, and I am not sure Gavin has fully answered them. For instance, is there a historian who has not, at some point, cited a newspaper story to support some crucial point? Since Thucydides, good historians have tried to find all the available sources and determine which ones are most credible. But we inevitably pick and choose those accounts that appear to add up to the most plausible explanation.
As critics like Duncan Watts and Nassim Taleb point out, implausible things happen all the time. If we cannot actually run experiments, how can we eliminate different interpretations? Moreover, taken as a whole, the archive is intrinsically biased. Policymakers of the past created a record that reflected their own understanding of events, and policymakers of the present systematically withhold evidence that would cause embarrassment.
Historians have answers to these objections. Unlike some political scientists, we make predictions about the future: Rigorous history constitutes a prediction that newly uncovered sources will only confirm the essential truth of the author’s argument. And we are constantly scouring the world to find new sources, since we know that the best way to correct for the bias in one country’s official record is to explore the records of all the rest.
Alas, a powerful new trend is beginning to undermine this whole enterprise: What William McAllister, calls the “The Documentary Big Bang.” We are just beginning to see the first wave of electronic records from the 1970s, and it is already overwhelming archivists. Rather than neatly arranged papers in Hollinger boxes, reproducing the original files of the officials who worked with them, these records are coming in massive data dumps and in no discernible order. For instance, we now have some 1.4 million diplomatic cables from the years 1973-1977, all of them text searchable. But without a finding aid or any sense of the original filing system, we have no obvious way to determine what it is we are sampling. How will we cope with the more than 40 million e-mails produced during the Clinton administration, much less the two billion e-mails that are now produced every year by the State Department?
While Frank Gavin and Hal Brands are impressed with all the documentation that is now available, we should not just compare it with what we had before. We need to think about how much more we might be missing. Consider, for instance, that in 2013 the Department of Defense declassified less than 23 percent of the pages it reviewed for “automatic” declassification, and the CIA released barely 20 percent (not counting the Agency’s clandestine branch, which is not even covered by the program.) The underfunded declassification system is already breaking down, with far fewer records released in recent years compared to the 1990s. It may collapse when it has to deal with the much larger volume of classified records that come with the proliferation of e-mails, text messages, and video conferences. This assumes that these records will actually be preserved and eventually transferred to the National Archives. In fact, some government agencies cannot even produce e-mails written three years ago.
Either the archive will expand exponentially, or – more likely – we will have access to a smaller and smaller part of the historical record. If we become increasingly dependent on the search engine, we are at ever greater risk of cherry-picking our evidence. The idea that even multi-archival, international research can produce a complete and accurate representation of past events may therefore become increasingly illusory.
But this challenge also presents an opportunity, provided that social scientists team up with data scientists and develop new methods to mine these records. With something approaching a complete archive of U.S. cable traffic, for instance, we can test historical claims with greater rigor. The relative frequency with which diplomats talk about this or that country helps show whether it was a priority for U.S. foreign policy. When combined with the archives of other countries, it might provide a new way of measuring the international agenda, or even the rise and fall of great powers. We can also pose entirely new questions that will help us understand what we are missing, using techniques like burstiness detection to model crises – and non-crises – and topic-modeling and named-entity extraction to identify subjects that are more likely to remain classified.
Unlike a lot of the quantitative data used in social science research, computational history would not depend on coding by research assistants or self-reporting in surveys. We do not, in other words, have to rate wars on a scale of one to five, or fall into the fallacy of equating attitudes recorded in polls with actual behavior. We have the immense advantage of using primary sources, the raw stuff of history, only now with the possibility of very large-n analysis. Even if we can never have the whole corpus, we may have enough of it to mitigate the selection bias and out-of-sample issues that bedevil a lot of international relations research. And rather than limit ourselves to running regressions, we can use an array of computational methods and combine them with close reading of sources.
To be sure, not all research questions will reward these methods. But for those who work on contemporary political history, multi-disciplinary collaboration of the kind Scott Sagan calls for may be increasingly important. In the fight to ensure adequate funding for archival preservation and declassification, it is absolutely essential. If we can save this data and make it accessible, teams of social scientists and data scientists can start to run ambitious new experiments, with replicable methods and data made available for inspection. In a field like nuclear history, we might start predicting what, specifically, newly declassified documents will reveal, and specify our confidence in making these predictions. Others will be able to judge how, specifically, we were wrong. More importantly, they will be able to develop models that produce more accurate predictions. But this future requires that historians and political scientists come out of their monkey cages, stand erect, and start working together.