How accurate will these predictions turn out to be? Unfortunately, we have no way of knowing. When Stephens says Iran will “probably” get its bomb, does he mean that it will happen in the next six months, or the next 10 years? By how much do Brooks and Kristof think the chances of war, or of Iran getting the bomb, will increase? What is more, these pundits’ eagerness to hedge their bets — Brooks, for example, says that the deal only “may” increase the odds of war — limits the extent to which they can be held accountable should their predictions prove wrong. On foreign policy, opinion writers have figured out how to make bold statements while qualifying their claims just enough so that they can never be fully tested — and can never be proved wrong.
Without the possibility of being proven wrong, pundits are unlikely to change their minds, and readers can continue to seek out arguments that merely confirm their ideological inclinations. This way lies the close-minded partisanship that afflicts so much of contemporary political discourse.
But is there an alternative?
Philip Tetlock, a professor of psychology and political science at the University of Pennsylvania, thinks there is. As he writes in his latest book, “Superforecasting,” out this week and co-written with Dan Gardner, “All we have to do is get serious about keeping score.” Tetlock’s idea is to host forecasting tournaments. Each week, pundits could develop a question about world events that both has a clear answer and that tells us something important about global developments. Examples could include: Will Assad still be in power in six months time? Will there be a military exchange in the South China Sea in the next year? Will the number of terrorist attacks sponsored by Iran increase within one year of the removal of sanctions? Participants would be asked to assign explicit probability estimates to such questions, ranging from 0 (the event has no chance of occurring) to 1 (the event is guaranteed to occur). Their predictions could then be scored based on real-world outcomes.
For four years, Tetlock has run such tournaments to test how accurately people can predict geopolitical events. Funded by the Intelligence Advanced Research Projects Activity, the intelligence community’s equivalent to DARPA, the tournaments required competitors — volunteers drawn from a wide range of careers, all with an amateur interest in politics — to make thousands of forecasts. Tetlock and his team have shown that it is possible to grade predictions about real world events, and that some participants have verifiably better records than others. Using only publicly available information and a basic understanding of probability, “superforecasters” — those who performed in the top 2 percent of participants in the tournament — were able to predict future political events 30 percent more accurately than intelligence analysts with access to classified information. And this was no fluke: superforecasters from the tournament in year two increased their lead over other forecasters the next year, rather than reverting to the mean. These superforecasters were open-minded, skillful at distilling signals from the noise of everyday news as they updated their beliefs, and highly self-aware about their own potential biases and flawed assumptions.
The book and its findings have important implications for the U.S. intelligence community. If the CIA could increase the accuracy of its estimates by as much as 30 percent, it could seriously reduce the likelihood of multi-trillion dollar intelligence failures, such as those that led to the Iraq War. But forecasting tournaments also offer the hope of improving public discourse in two, connected, ways: increased accountability and reduced partisanship.
First, accountability. While many pundits may not see themselves as being in the business of making forecasts, implicit predictions underpin much of what they write. It seems strange then that we know more about the accuracy of the predictions of Bill Flack, a superforecaster and retired irrigation specialist from Nebraska, than we do about the predictions of Paul Krugman or Bret Stephens. Bill Flack has never been invited to the White House, and his opinions are not broadcast by the New York Times or the Wall Street Journal, which reach millions of readers. When Krugman or Stephens write, laymen and decision-makers listen. With such power should come equal accountability.
And with public accountability may come reduced partisanship. Data from Tetlock’s forecasting tournaments has shown that making predictions that can be objectively proven or disproven reduces political polarization. Tournament participants who once held strong political views at either end of the spectrum moderated their opinions once they were graded solely on their accuracy. In most policy debates, pundits are motivated only in part by accuracy, and at least as much by other goals such as self-promotion, partisan allegiance, or contrarianism. When their success depends on a public image that projects confidence, pundits are unlikely to ever change their minds, even when the facts change. But when public scores of accuracy were recorded in Tetlock’s studies, tournament forecasters became more tolerant of nuance and less partisan. We predict the same would happen for pundits — and the benefits for public discourse would be enormous.
All that’s required is that, alongside the compelling narratives that opinion journalists are so good at writing already, they make one testable prediction a week. Pundits could then compete against the general public to see how accurate they are, and could learn from their own errors once clear data is gathered. Within a year we would have 50 data points by which to grade a pundit — without sacrificing what makes op-eds fun to read. And if their forecasts become a little more cautious and open-minded, so much the better. All we have to do is define the terms of each bet more clearly, and start keeping score.
Of course, it will be difficult to get writers to sign on. For one, opinion writers are rewarded by clicks, and cable news guests are rewarded by views. It’s more exciting to watch an overconfident polemicist than a nuanced take on a problem that applies quantifiable probability estimates.
For another, established pundits have no incentive to put their money where their pen is. They’re incumbents, already at the top of the hierarchy, so making explicit predictions and participating in forecasting tournaments only introduces the risk of embarrassment.
These two issues are deep problems built into the incentive structure of opinion journalism. To change these norms, a few brave pundits will have to put their reputations on the line before there can be any sort of wider movement.
The consequences could be as far-reaching as they are profound. Just as modern medicine began when a farsighted few began to collect data and keep track of outcomes, to trust objective “scoring” over their own intuitions, it’s time now for similar demands to be made of the experts who lead public opinion. It’s time for evidence-based forecasting.
Sam Winter-Levy is an editor at Foreign Affairs and the former Von Clemm Fellow at Harvard University.
Jacob Trefethen is the former Henry Fellow at Harvard University’s Department of Economics.