The new data will make his job easier, but the information could also make him less relevant, since some users will be able to go directly to Congress to find the information he provides on his site, govtrack.us.
Currently, the Capitol’s most noticeable online fossil is the Library of Congress site named for Thomas Jefferson. On Thomas, users can search for bills by name, sponsor and subject. But they can look at only one bill at a time — divorced from the patterns, history and context that make all the difference on Capitol Hill.
“If you’re outside the Beltway, you can’t understand Thomas,” Tauberer said.
He began working to organize congressional data in 2001, while a freshman at Princeton University. “What I saw was an opportunity for something much better.”
The job wasn’t easy. Over 11 years, Tauberer has built a computer program that “scrapes” the balky Thomas site, searching twice a day for new bills, new votes and new co-sponsors. It takes that information and packages it in a more Web-friendly form on his site.
On Thomas, a no-chance measure like, say, H.R. 40 — which would establish a commission to study paying reparations for slavery — might look like any other bill. It has a veteran sponsor, Rep. John Conyers Jr. (D-Mich.), and it has been assigned to a House subcommittee.
On govtrack.us, it’s possible for an untrained user to find some of the context that a Hill staffer carries in her head. If you look up H.R. 40, Tauberer’s site notes that it’s a repeat bill that has failed again and again in Congress. Conyers’s party is out of power in the House. And, to begin with, only about 4 percent of House bills ever pass.
“This bill has a 0% chance of being enacted,” the site says.
A Conyers spokesman was only slightly more sanguine, saying: “Rep. Conyers hopes a greater understanding of the substance of H.R. 40 will increase the likelihood of its passage, but he understands it takes time to educate members and the public.”
In all, Tauberer has spent nearly a dozen years tracking the vast amounts of wasted time and dead-end bills that the new data should reveal to the world. “I don’t fault them for trying anymore,” he said of America’s legislators. “I actually think it’s kind of sweet.”
Tauberer isn’t the only one doing this: CQ Roll Call, for instance, has its own way of assembling the same data, which it provides to paying clients. But Tauberer is one of the few people who offers the data for free. That has become fodder for sites like Maplight.org, which combines bill data with fundraising information to show how much has been donated by a bill’s supporters and opponents. At OpenCongress, users can find news reports about a bill and offer their comments on the text, line by line.
Tauberer says he wants Congress to release the data itself, because he worries about flaws in his homemade system. It lags behind the latest news, for one thing, and it can produce errors when the computer misreads something.
The nonprofit groups that depend on Tauberer’s data have a more morbid worry.
“What happens if he walks in front of a bus?” said Daniel Schuman of the nonprofit Sunlight Foundation. Where would the data come from then? “This is a question about basic information that the federal government should be giving to the American people.”
This is not the first time that Congress — led by middle-age legislators whose best skills are verbal — has come late to a shift in electronic communications. The Capitol had a telegraph office, for instance, until 2007. The Senate still sends its messages by teenager, with its 183-year-old page program.
The House bill is intended as a step in the right direction. But the issue will be handed over to an institution far slower and older than the Internet: a congressional task force. It will decide how the bulk data will be released and how to protect information from being falsified or corrupted.
“I think everybody in Congress is happy to say, you know, ‘Here’s what we’re doing,’ ” said Rep. Ander Crenshaw (R-Fla.), who helped craft the bill that was voted on Friday. He said he recognized that the current system wasn’t enough: “I’m not a tech guy, but there’s a better way to do that.”