Afghan security forces investigators are pictured through the wreckage of a burnt car at the scene of a suicide bombing in the Sarobi district of Kabul province Feb. 21, 2014 (Shah Marai/AFP/Getty Images).

Most of what we know about today’s wars we learn from the media. Conflict and violence are a firm part of the daily stream of information we consume. No incident, wherever it occurs, is too small to be reported straight to our screens at home. For conflict researchers, the advent of the information age should be good news: It has now become possible to produce detailed data collection on violence at unprecedented levels of detail and speed.

If we believe this, we flatly ignore generations of media research that attest to the strong presence of bias and selectivity in media-reported information. So, is this really the data source we as researchers should be relying on? Can we assume that the event lists we generate are anywhere close to what is happening on the ground? With a growing number of event data projects in political science, this is a question scholars cannot ignore.

There are two potential issues. First, reporting in the international news may be selective, in the sense that it covers some events but not others. Second, information in news reports may be inaccurate or simply false, making the coding based on these reports unusable for scientific analysis. My recent paper in the Journal of Conflict Resolution addresses the latter question by taking advantage of the rare chance to compare media-based conflict events with another data set from a non-media source.

The recent availability of detailed military reports from conflict regions provides this unique opportunity to conduct a comparison between media-reported accounts of violence and versions of these events as recorded by military forces. For the latter, I use data from 2008 to 2009 from the Significant Activities (SIGACTS) military database for Afghanistan, which was collected by U.S. and Coalition forces. These incidents were matched with a set of events extracted from international news media by the Uppsala Conflict Data Program’s Georeferenced Event Dataset (GED) project. The GED usually relies on more sources than news media, but the preliminary Afghanistan coding that was made available to me only includes news media reports.

For the years 2008 and 2009, the matched dataset contains 1,077 pairs of events of lethal violence between Taliban militants and Coalition forces. I tried to see how well the two sources – military database and international news – agree on some of the “hard facts” about these events. In particular, I was interested in two pieces of information: the reported number of casualties of an event and the event’s location – the latter being especially important since most event datasets are now georeferenced and attach a geographic location to each entry. The focus on these hard facts alleviates some concerns about biased reporting in the military database, which serves as the reference in my study: SIGACTS locations are reported using GPS data, and there is little reason to manipulate it. The number of casualties is more sensitive information. While we could expect the nature of casualties to be subject to systematic errors (for example, civilian casualties could be recorded as insurgents), the total number should be less prone to bias. In general, the SIGACTS database should be less problematic in this regard since it was not created for public dissemination.

What is the disagreement between the two datasets, and what explains it? I find some interesting patterns. The “spatial error” of the media-based event codings – i.e., the difference between the media-reported location of an event and its true location according to the SIGACTS database – grows as we move away from densely populated areas. In other words, as fewer people witness an event, the accuracy of information the media can obtain about it decreases, resulting in a higher error. This interpretation is supported by the finding that roadside bomb attacks show a lower spatial error than incidents of direct violence (for example, small arms fire). Roadside bomb attacks will generally have more observers, since they are less dangerous to the public once they have occurred as compared to, for example, shootings between insurgents and coalition forces.

More interesting, however, are the overall magnitudes of the spatial error and the error in the number of casualties. Figure 1 plots the cumulative distribution of the spatial error in the media-based event codings. The plot clearly shows the limits of spatial resolution in media-based event datasets. Only about half of all events are correctly located within 15 km of their true location. The bottom line here is not that media-based event codings are useless for spatial analysis, but rather that we should be treating the spatial information with a fair amount of uncertainty.

Figure 1: Cumulative density of the spatial error in media-based event codings (Nils B. Weidmann for Journal of Conflict Resolution)

The error in the number of casualties also gives less reason to worry than we might think. Figure 2 shows a histogram of cases where the SIGACTS data and the GED agree on the number of casualties (high bar in the center), where the media overreports (left part of the histogram), and where it underreports as compared to the military dataset (right part of the histogram). The good news here is twofold. First, there is agreement in about half of all cases. Second, there is no evidence of systematic over- or underreporting by the media. Thus, suspicions of the media systematically inflating casualty numbers to create more sensational reports are not supported by the data.


Figure 2. Comparison of casualty numbers from the GED and the SIGACTS database (Nils B. Weidmann for Journal of Conflict Resolution)

While these results are at least partly encouraging for creators and users of media-based conflict datasets, more work is ahead for researchers. My comparison only focused on events that made it to the international news, but what about those that do not? What determines this selective reporting, and how does it affect the conclusions we draw based on media-based data collections of violence? Only after answering these questions will we be able to fully determine if our partial views on violence lead to impartial results.

Nils B. Weidmann is a professor of political science and head of the “Communication, Networks and Contention” research group at the University of Konstanz, Germany.