As part of his ongoing investigation into the Internal Revenue Service's possibly improper scrutiny of tax-exempt but politically tinged groups, Rep. Darrell Issa (R-Calif.) reported last month that the IRS in 2010 turned over a massive cache of nonprofit groups' financial filings that included, wrote Issa, "legally protected taxpayer information that should not have ever been sent."
It's not a hugely unexpected development; in 2012 the Chronicle of Philanthropy reported that almost one in five nonprofit groups include Social Security numbers on the filings they submit to the tax agency.
But this week, a tiny fleet meant to help address the problem is arriving in inboxes all over Washington. It is coming in the form of tiny USB flash drives, shaped like George Washingtons and Abe Lincolns. The drives and accompanying letters are being sent to Issa and other congressional investigators, as well as President Obama, IRS Commissioner John Koskinen and several other administration officials. The drives and letters hold heaps of evidence of the errors — some 9,392 990 forms from nonprofits that were released with Social Security numbers (now scrubbed) — and suggestions on how to prevent future data messes.
The coordinated mass mailings are the work of Carl Malamud, a long-time advocate for boosting the public's access to government information. Malamud, a Californian who works under the banner of Public.Resource.Org, has spent years digging through the IRS's Exempt Organization's database in a bid to make more public what he calls "a vital source of market information for one of our most important economic sectors." The exposure of so many Social Security numbers irks him, he said, but Malamud is not one given to stewing. In the 1990s, he fought to put the Security and Exchange Commission's records online. He's also found success in nudging everything from C-SPAN videos to state building codes towards being freely available on the Internet.
Now Malamud's turned his attention to the IRS and its information technology troubles. "I have heard a number of convoluted explanations as to why the IRS cannot deal with this massive privacy breach," writes Malamud in his letter to Koskinen. "None of those explanations are convincing."
At the moment, Koskinen has much on his plate. For one thing, he's being prodded by Issa's oversight committee to clarify recent testimony about e-mails that seem to have disappeared from the computer system of Lois Lerner, the then-head of the exempt organizations division. But now is hardly the moment for a battered agency to retrench, writes Malamud. The agency should start, he says, by dedicating the resources necessary to pulling down and redacting the tax-exempt database. "Tough times demand more action," Malamud advises Koskinen, "not less."
Malamud is finding an ally in Darrell Issa, the IRS's most vigorous congressional pursuer. Asked about Malamud's project, Issa said via e-mail that while the IRS seems to have an "aversion" to healthy tech practices common in the private sector, it's not about a lack of financial resources. "It's about leadership, management, and sometimes a bureaucratic resistance to transparency." Indeed, argued Issa, "these improvements could cost far less than the IRS would have us believe."
Malamud has some ideas on that front. This is a people problem, not a technology problem, he diagnoses, "and I would strongly urge the IRS to reach out and ask for help."
To get the ball rolling, Malamud has reached out to Todd Park, who is both the chief technology officer of the United States and the administration figure widely considered to have led the saving of HealthCare.gov. In working with the scanned TIFF images that populated the IRS's database (built on "a shockingly bad IT code base of Windows XP"), Malamud says, he's noticed that there's a lack of good, scalable tools for both converting images to text or blacking out sensitive information. There are free and open-source components for both available online. But "nobody has put them together in a way that would be useful to an organization such as the Internal Revenue Service as they process several million returns."
Perhaps, suggests Malamud, Park's team could help.
Such a home-grown redaction tool could be used by other federal agencies and offices to improve their data hygiene. Malamud points to how, in 2010, the Office of the Federal Register piggybacked off an existing open-source project and partnered with outside developers to launch a new digital version of the Federal Register.
Malamud also encourages the IRS to stop distributing the database via DVDs that cost $2,910 apiece, make it freely available online, and then "put a feedback loop in place so people can notify you when problems are found."
In his pitch to Obama, Malamud sees bigger lessons here about getting ahead of technology problems before they become political disasters: "The healthcare.gov system is perhaps an opportunity to learn. Only after the system was a total ﬁasco and became a nightly story on CNN was enough ﬁrepower brought to bear to begin solving the problem." That's not an isolated incident, he says. "Bad IT," Malamud tells Obama, "has hobbled our entire federal government."
You can read Malamud's letters to Obama, Koskinen, Park, Secretary of the Treasury Jacob Lew, Treasury Department Inspector General J. Russell George and Archivist of the United States David Ferriero here.