By Ellen Nakashima and Alec Klein
Washington Post Staff Writers
Wednesday, February 28, 2007
The Department of Homeland Security is testing a data-mining program that would attempt to spot terrorists by combing vast amounts of information about average Americans, such as flight and hotel reservations. Similar to a Pentagon program killed by Congress in 2003 over concerns about civil liberties, the new program could take effect as soon as next year.
But researchers testing the system are likely to already have violated privacy laws by reviewing real information, instead of fake data, according to a source familiar with a congressional investigation into the $42.5 million program.
Bearing the unwieldy name Analysis, Dissemination, Visualization, Insight and Semantic Enhancement (ADVISE), the program is on the cutting edge of analytical technology that applies mathematical algorithms to uncover hidden relationships in data. The idea is to troll a vast sea of information, including audio and visual, and extract suspicious people, places and other elements based on their links and behavioral patterns.
The privacy violation, described in a Government Accountability Office report that is due out soon, was one of three by separate government data mining programs, according to the GAO. "Undoubtedly there are likely to be more," GAO Comptroller David M. Walker said in a recent congressional hearing.
The violations involved the government's use of citizens' private information without proper notification to the public and using the data for a purpose different than originally envisioned, said the source, who declined to be identified because the report is not yet public.
The issue lies at the heart of the debate over whether pattern-based data mining -- or searching for bad guys without a known suspect -- can succeed without invading people's privacy and violating their civil liberties.
DHS spokesman Larry Orluskie said officials had not yet read the GAO report and could not comment.
Another DHS official who helped develop ADVISE said that the program was tested on only "synthetic" data, which he described as "real data" made anonymous so it could not be traced back to people.
The system has been tested in four DHS pilot programs, including one at the Office of Intelligence and Analysis, to help analysts more effectively sift through mounds of intelligence reports and documents. In another pilot at a government laboratory in Livermore, Calif., that assessed foreign and domestic terror groups' ability to develop weapons of mass destruction, ADVISE tools were found "worthy of further development," DHS spokesman Christopher Kelly said.
The DHS is completing reports on the privacy implications of all four pilot programs. Such assessments are required on any government technology program that collects people's personally identifiable information, according to DHS guidelines.
The DHS official who worked on ADVISE said it can be used for a range of purposes. An analyst might want, say, to study the patterns of behavior of the Washington area sniper and look for similar patterns elsewhere, he said. The bottom line is to help make analysts more effective at detecting terrorist intent.
ADVISE has progressed further than the program killed by Congress in 2003, Total Information Awareness, which was being developed at the Defense Advanced Research Projects Agency (DARPA). Yet it was partly ADVISE's resemblance to Total Information Awareness that led lawmakers last year to request that the GAO review the program. Though Total Information Awareness never got beyond an early research phase, unspecified subcomponents of the program were allowed to be funded under the Pentagon's classified budget, which deal largely with foreigners' data.
The Disruptive Technology Office, a research arm of the intelligence community, is working on another program that would sift through massive amounts of data, such as intelligence reports and communications records, to detect hidden patterns. The program focuses on foreigners. Officials declined to elaborate because it is classified.
Officials at the office of the director of national intelligence stressed that pattern analysis research remains largely theoretical. They said the more effective approach is link analysis, or looking for bad guys based on associations with known suspects. They said that they seek to guard Americans' privacy, focusing on synthetic and foreigners' data. Information on Americans must be relevant to the mission, they said.
Still, privacy advocates raise concerns about programs based on sheer statistical analysis because of the potential that people can be wrongly accused. "They will turn up hundreds of soccer teams, family reunions and civil war re-enactors whose patterns of behavior happen to be the same as the terrorist network," said Jim Harper, director of information policy studies at the Cato Institute.
But Robert Popp, former DARPA deputy office director who founded National Security Innovations, a Boston firm working on technologies for intelligence agencies, said that research anecdotally shows that pattern analysis has merit. In 2003, he said, DARPA researchers using the technique helped interrogators at the U.S. prison at Guantanamo Bay, Cuba, assess which detainees posed the biggest threats. Popp said that analysts told him that "detainees classified as 'likely a terrorist' were in fact terrorists, and in no cases were detainees who were not terrorists classified as 'likely a terrorist.' "
Some lawmakers are demanding greater program disclosure. A bipartisan bill co-sponsored by Senate Judiciary Committee Chairman Patrick J. Leahy (D-Vt.) would require the Bush administration to report to Congress the extent of its data-mining programs.
Staff researchers Richard Drezen and Madonna Lebling contributed to this report.