Data are a defining feature of modern society. Every day, humans and the machines they interact with create 2.5 trillion megabytes of data. As data become more prominent and readily available, the temptation to analyze them and make sense of the world through specific analytics methods or algorithms grows.
This is particularly true for national security. Big data is a “big deal” for U.S. spy agencies, which have long relied on multiple data sources to produce intelligence reports. In the past decade, agencies like the CIA and the NSA have institutionalized big data through the development of dedicated analytics units and research and development projects focusing on the analysis of online data such as YouTube videos and social media posts.
Big data has become a frequent subject of national security reporting and academic research, where experts often raise concerns about “big brother.” Civil liberties concerns often dominate the public debate on big data, and much less has been written on how big data tools contribute to national security.
This omission is a problem because if we don’t understand what big data does and how it’s used, we can’t evaluate its effects on national security decisions and, by extension, on civil liberties. In a new article appearing in International Affairs, we identify five ways big data analytics supports national security decision-making:
- Anomaly detection
Anomaly detection identifies items, events or observations that don’t conform to an expected behavior or pattern. This can be used to automatically assess whether an online activity is suspicious. Here the unusual activity of a trusted individual, or insider, could be detected as a malevolent or inadvertent action and distinguished from the background of everyday network activity.
Association mining algorithms discover interesting relationships and patterns hidden in large data sets. These relationships and patterns are generally discovered as a result of the frequent appearance of a group of entities like people, organization, location, in multiple documents such as numerous reports prepared by intelligence officers as well as publicly available sources.
- Classification and clustering
One of the main contributions of big data tools to national security is in the domain of intelligence processing. Classification algorithms assign objects in a collection of data to target categories or classes. Classification models can be used to identify an intercepted phone call as a part of a zero-, low- or high-risk suspicious activity.
Clustering refers to grouping objects or data points together based on notions of similarity. Such capabilities are ideal to sift through vast amounts of diverse social media data, organizing them in topic groups and generating summaries of their content for human consumption. Clustering can also help identify different types of social media users (opinion leaders, bots, etc.).
- Link analysis
Link analysis is most useful in defining, discovering and evaluating relationships between objects and data points. This type of algorithm is commonly used to identify nodes and networks connecting peoples, organizations and other entities. One of the most famous applications of link analysis is the identification of critical nodes in terrorist or criminal networks such as al-Qaeda through social network analysis.
- Machine learning
Machine learning refers to a special set of algorithms that can independently adapt and learn from the data they process, and synthesize newly appearing information. Machine learning algorithms can, for instance, retrieve hidden context from document collections, identify phishing attacks, detect network intrusion, recognize human faces and analyze crowds. In all these cases, big data facilitates intelligence analysis and sometimes automates security.
Big data requires human judgment
Big data already supports core intelligence functions such as data collection and processing, intelligence analysis and dissemination, and counterintelligence and security. The volume, variety and velocity of modern data streams has made big data tools indispensable to national security. Yet data analytics and algorithms are developed by and for human consumption and can only be as useful as humans make them.
Big data is most useful when combined with human judgment. Machines and the algorithms strip out much of the context in which humans interact. Some important national security insights, such as information on the intentions of foreign leaders, remain best identified by humans. Big data applications driven by machine learning algorithms also perform better when human analysts provide feedback to the system.
Big data can’t replace humans in their central role as producers and consumers of national security. At their best, analytics techniques free, or even aid, humans to do what they do best: think, ask questions and make judgments. The future of big data and national security lies in humans’ ability to embrace the power and mitigate the limits of algorithms.
Damien Van Puyvelde is lecturer in intelligence and international security at the University of Glasgow. Find him on Twitter @DamienVP.
Stephen Coulthart is assistant professor of security studies at the University of Texas at El Paso.
Shahriar Hossain is assistant professor of computer science at the University of Texas at El Paso.