The Washington PostDemocracy Dies in Darkness

New documents show how the NSA infers relationships based on mobile location data

( <a href="">Vincent Diamante</a> )

Everyone who carries a cellphone generates a trail of electronic breadcrumbs that records everywhere they go. Those breadcrumbs reveal a wealth of information about who we are, where we live, who our friends are and much more. And as we reported last week, the National Security Agency is collecting location information in bulk — 5 billion records per day worldwide — and using sophisticated algorithms to assist with U.S. intelligence-gathering operations.

How do they do it? And what can they learn from location data? The latest documents show the extent of the location-tracking program we first reported last week. Read on to learn more about what the documents show.

What’s the big deal? Information about where people go and when seems pretty innocuous.

The NSA doesn’t just have the technical capabilities to collect location-based data in bulk. A 24-page NSA white paper shows that the agency has a powerful suite of algorithms, or data sorting tools, that allow it to learn a great deal about how people live their lives.

Those tools allow the agency to perform analytics on a global scale, examining data collected about potentially everyone’s movements in order to flag new surveillance targets.

For example, one NSA program, code-named Fast Follower, was developed to allow the NSA to identify who might have been assigned to tail American case officers at stations overseas. By correlating an officer’s cellphone signals to those of foreign nationals in the same city, the NSA is able to figure out whether anyone is moving in tandem with the U.S. officer.

What kind of information is the NSA collecting?

Mobile devices reveal their locations in multiple ways.

When mobile devices are turned on and begin searching for cellular signals, they reveal their locations to any radio receivers nearby. As cellphones connect to cellular networks, they immediately register their locations to one or more databases maintained by telephone providers and clearing houses in order to allow calls to be made and received. These databases are known as Home Location Registers and Visitor Location Registers.

Registration messages often include a device’s ‘coarse’ location, at the level of a city or country, or a ‘finer’ position based on distance from a cellular tower (based on their VLRs). Most mobile operators also track phones precisely by triangulating their distance from multiple towers, for example to provide location-based emergency services.

Many mobile devices and smartphones also use WiFi and GPS signals to fix their locations. These signals also reveal their location in a variety of ways including leaked location information from their IP address, mobile apps and built-in location based services. To help the NSA pinpoint the exact location of surveillance targets, a program called HAPPYFOOT intercepts traffic generated by mobile apps that send a smartphone’s location to advertising networks.

How are they collecting it?

The NSA has multiple, redundant methods to collect location data from the airwaves, Internet communications and underlying cellular infrastructure that connects the global network of mobile devices. The agency collects two kinds of information about mobile devices. Information collected from the phone network itself is known as Dialed Number Recognition (DNR) data. Information collected from data communications is known as Digital Network Intelligence (DNI).

The NSA collects Internet and phone metadata via relationships with corporate partners that allow the agency to intercept traffic at key network routing points. These partnerships take many forms. A company may transfer Call Detail Records to the NSA or allow the agency to install large-scale surveillance equipment to capture this traffic.

Communications by the world’s cellphones is supported by SS7, a global data protocol that links phone networks together, allowing phones to call one another even if they’re on different providers. Researchers have demonstrated that the SS7 network is vulnerable to surveillance, as is the related GRX system used to provide mobile data services to cellphone. We now know that the NSA is taking full advantage of that opportunity using sophisticated surveillance equipment such as JUGGERNAUT which can process raw feeds between mobile carriers.

NSA surveillance programs are identified by “SIGADs,” short for “signals intelligence activity designators.” Internal NSA support documents reviewed by The Washington Post indicate that location information is collected via at least 10 SIGADs, including: DANCINGOASIS, FAIRVIEW, MYSTIC, OAKSTAR, RAMPART-A, RAMPART-M, RESOLUTETITAN, STORMBREW, TIMBERLINE, and WINDSTOP. Three of the SIGADs are believed to be located in the United States: FAIRVIEW, STORMBREW and TIMBERLINE which is at Sugar Grove Research Station in West Virginia.

Finally, the NSA employs traditional cellular radio collection as well using Digital Receiver Technology equipment to collect information from the airwaves, typically located in embassies or flying aircraft.

How much information is being collected?

FASCIA is the NSA’s data warehouse for storing location metadata.

Documents show that about 5 billion records per day are being ingested into FASCIA. However, billions of records doesn't necessarily mean billions of phones. Your mobile device sends a record of its location each time it connects to the network or moves between cellular towers. The frequency of these records will depend on factors such as the density of users in the area and how much the individual moves around. The more you move, the more times you update your location. So one device could be responsible for dozens of records in a single day.

All this information is stored using Hadoop, an open-source software framework for storage and large-scale data processing that is derived from academic papers by Google engineers. The system underlays many of the largest data-processing projects in the world today.

Is the information being collected on American soil? Are Americans affected?

At least three of the stations for collecting location data are located in the United States. The NSA says that these collection systems are “tuned to be looking outside the United States.” But the agency does acquire a substantial amount of information on the whereabouts of domestic cellphones “incidentally,” a legal term that connotes a foreseeable but not deliberate result.

Part of the difficulty lies in the distributed nature of global telecommunications infrastructure. For example, billing information must be shared among wireless operators in order to accommodate users roaming between cellular networks. Also, Internet and radio (cellular) based collection mechanisms do not adhere to strict geographic boundaries.

Documents reviewed by The Washington Post indicate that some location information is collected in the United States. Specifically, one training document indicates that CHALKFUN, the standard interface to the NSA’s database, can be queried to ascertain whether a target is foreign prior to targeting. To do this, CHALKFUN has to collect information about devices located in the United States and must do so before the individual has been targeted. Excepts from a training manual describe how to verify the location of a targeted device, including using HLR and VLR registrations in order to locate individuals.

Another document even references the American networks T-Mobile and Verizon as examples when discussing some of the limitations of their platform, although a senior U.S. intelligence official said that example was purely hypothetical.

Is that legal?

The documents make reference to a number of authorities under which this collection occurs, including Executive Order 12333, which is the president’s guidance for espionage under his own sole authority overseas. Courts have no jurisdiction and Congress does little oversight of this kind of intelligence collection. One of the challenges is that very little is known about the intelligence community uses this presidential authority. Because it is solely an executive document, the president may rewrite it at will.

Additionally, metadata about U.S. persons collected in the United States can be used once it is captured. The Supplemental Procedures Governing Communications Metadata Analysis “enables the analytic to chain ‘from,’ ‘through,’ or ‘to’ communications metadata fields without regard to the nationality or location of the communicants, and users may view those same communications metadata fields in an unmasked form.” In plain English: the rules allow the NSA to hold information collected “incidentally” about U.S. persons and use it for analytics.

What does the NSA do with all the location data it collects?

Location information is useful for many purposes. For targeted surveillance, it’s useful for identifying the locations of wanted terrorists or foreign intelligence suspects. As described above, the NSA is also using it to ascertain the “foreignness” of potential targets prior to initiating targeted surveillance, even in the United States.

One of the most controversial uses of the bulk location information collected falls under a suite of tools called “Co-Traveler Analytics.” The document below outlines myriad analytics techniques that process bulk location information to identify people who may be of interest based on location behaviors they exhibit, regardless of whether or not they were previously a suspect.

Some of the techniques outlined in the document include:

CHALKFUN ANALYTICS: This tool seems to be the broadest in scope, processing all data collected in the FASCIA database which contains cell-site location information (VLRs and GCIDs). It examines movements on a global scale in order to identify new suspects who might have shared a similar movements with a person of interest, such as passing through the same location within a 1 hour window. This analytic is currently being upgraded to a cloud-based system called R6 SOTINGLEAD in order to improve performance as it operates on such a large dataset -- 27TB per 7 months of collection.

DSD Co-Travel Analytic: Developed by the Australian surveillance agency, then called the Defense Signals Directorate (and more recently the Australian Signals Directorate), this technique examines mobile Call Detail Records containing location information in order to predict potential points of intersection -- projecting into the future all the individuals that may ‘cross paths’ with a given target. Plans are also underway to identify targets based on suspicious behaviors such as identifying mobiles that are turned off right before two people meet.

RT-RG Sidekicks: This cloud-based co-traveler analytic examines average travel velocity between pairs of travelers in order to determine whether it would be practically possible for the travelers to have traveled together.

SSG Common IMSIs Analytic: Identifies devices that were visible to cell towers in more than one country at once, to locate people crossing international boundaries.

The Cafe project: utilizes DNI based IP geolocation as a travel indicator, looking for when targets might have been seen in the same city as the target over a given time frame.


ASDF: Data interchange format for DNI
CELL ID: This refers to a cell tower ID — a unique identifier for a cellular tower or cell site.
CELL SITE: Often referred to as a cell tower, a cell site includes antennas and other equipment that transmit and receive mobile device signals.
CDRs: Call Detail Records. Records of calls that may contain location data.
CHALKFUN: NSA's location query tool that accesses FASCIA, its vast database of device location information.
CNE: Computer Network Exploitation - The NSA's term for hacking into computers and computing equipment
DNI: Digital Network Intelligence - information collected from the Internet
DNR: Dialed Number Recognition - information from phones, both mobile devices and landlines.
DRT: Digital Receiver Technology - this is passive receivers used to collect GSM information from collection points (such as embassies)
EVILOLIVE: The NSA’s IP geolocation team
FASCIA: The NSA’s data "repository" or warehouse for location metadata
GCID: Global Cell-Tower ID - This is the unique number associated with any given tower. It acts as a proxy for location since
GSM: Global System for Mobile Communications,
HAPPYFOOT: Analytic tool that aggregates leaked location-based service data to map the physical locations of IP addresses.
HLR: Home Location Registers are databases that maintain a user’s service information, including associations between cellular users and towers, which can be used to infer a user's location.
IMEI: International Mobile Station Equipment Identity. This is the unique serial number of the handset.
IMSI: International Mobile Subscriber Identity. This is the unique serial number of a user or device SIM card.
MSISDN: Mobile Subscriber Integrated Services Digital Network-Number - This is a unique number used to identify cell phones which usually includes the country code, provider code and phone number.
NGA: National Geospatial-Intelligence Agency
OCTAVE: This is the NSA universal targeting tool to enable expansion of selectors such as converting from a phone number to an MSISDN.
OCTSKYWARD: NSA's collection of GSM data from flying aircraft
OPC/DPC Pairs: These refer to the originating and destination points that typically transfer traffic from one provider’s internal network to another’s.
SIGADS: Signals Inteligence Addresses -- “signals intelligence activity designators” -- these describe the location where the collection occurs
SS7: Signaling System No. 7 - underlying protocol that handles most of the world’s mobile traffic
TAO: Tailored access operations. The team at the NSA responsible for targeted hacking and collection
TAPERLAY: The NSA's tool for looking up the registered location of a mobile device -- the provider and country where a phone was originally activated -- in the Global Numbering Database.
Thuraya: Global satellite phone provider
TUSKATTIRE: This is one of the many systems that the NSA uses for metadata processing to clean (dedupe, etc.) the data it’s ingesting.
VLR: Visitor Location Registers are databases that track current associations between cellular users and towers, which can be used to infer a user's location.


Here is a list of documents related to this story about the NSA's location-tracking programs.

How the NSA pinpoints a mobile device
This document is an internal NSA classification guide outlining the agency's abilities to intercept mobile cellular communications.

NSA signal-surveillance success stories
These slides contain excerpts from an April 2013 NSA presentation detailing signal surveillance techniques and successes.

What is FASCIA?
FASCIA is the NSA's enormous database containing trillions of device-location records that are collected from a variety of sources.

GHOSTMACHINE: The NSA's cloud analytics platform
Excerpts from slides describing the NSA's Special Source Operations cloud analytics platform, code-named GHOSTMACHINE.

How the NSA verifies a target's location
This is an excerpt from a transcript of NSA training videos that describe how to verify the location of a targeted device. It explains how the NSA monitors different types of mobile signaling information known as HLR and VLR registrations in order to locate individuals. It also makes clear that the agency is able to use location tracking to ascertain whether a target is in the United States.

How to tell if a target is 'foreign'
This is an excerpt from a National Security Agency training manual explaining how to determine if a targeted device is "foreign." It highlights a query for the past 60 days using the CHALKFUN location tool, which found "no roaming in the US."

Animation showing how the NSA's location tracking system works.

Ashkan Soltani is an independent researcher and consultant focused on privacy, security, and behavioral economics.