washingtonpost.com
The Next Frontier: Decoding the Internet's Raw Data

By Kim Hart
Monday, June 1, 2009

There's no shortage of uses for the massive amounts of data in every nook and cranny of the Internet.

Advertisers want to mine the photos and status updates you post on Facebook to better sell their wares. Scientists want to track weather patterns based on decades of climate records to better forecast troubling storms. And White House officials now want to make government data sets available for citizens to use however they see fit.

The problem is figuring out how to organize and display the data in a useful and informative way, instead of forcing people to sift through heaps of mind-numbing spreadsheets. When are bar graphs and pie charts enough to break down a set of numbers? What is the best way to display flu outbreaks, cellphone call logs or senators' voting records?

These are some of the questions that were debated last week by government researchers, computer science professors and corporate financial analysts who attended workshops at the annual symposium of the University of Maryland's Human-Computer Interaction Lab.

"We're trying to understand data and make sense of it visually, but there's no way of evaluating how effective these visuals really are for people," said Mave Houston, a research manager for PricewaterhouseCoopers. Part of her job is finding tools to help auditors and investigators examine complex financial data, such as diagrams that show relationships between revenue and expenditures.

Also in the room were analysts from the Department of Defense, SAIC and Lockheed Martin, who expressed frustrations that information visualization tools, or "infoviz" as some call it, are too complex for novice users. Or they don't work well with user-generated content. Or they can't handle large amounts of data.

One of the most common tools was used recently to track the spread of the H1N1 virus. It showed breakouts as they occurred in various cities by using larger dots to show higher concentrations of reported illnesses and smaller dots for individual cases.

Linking information, designing user-friendly technology devices and finding ways to improve people's interaction with the Web has long been part of the Human-Computer Interaction Lab's mission since it was founded by Ben Shneiderman in 1983.

Since then, the lab has been credited with creating hyperlinks -- highlighted words in a document that direct Web surfers to another site -- even down to their characteristic light blue color. Shneiderman also developed a tool known as "treemaps," which display information as blocks of color to show hierarchal relationships. Hive Group, a Texas software company, licensed the technology and now uses it to help corporations display a variety of data, such as stock prices or computer systems.

"It's satisfying to see what was once considered esoteric research turn into mainstream computer science that has revolutionized industries," Shneiderman said last week. "Just think, YouTube works because designers made it easy to search for videos effectively. Now we have high school kids creating videos that get 5 million views."

While many of the lab's projects focus on consumer tools, such as improving Web site designs and developing devices that are easy for children to use, the lab is getting increased attention from policymakers looking to leverage technology for government needs.

Last week, Shneiderman met with federal Chief Information Officer Vivek Kundra and Deputy Chief Technology Officer Beth Noveck to discuss ways of improving public participation in policymaking. The lab also works with the Library of Congress, NASA and the National Archives to integrate technology into their services.

"Our belief is that technology is not just useful as toys or for business," Shneiderman said. "We're talking about using these technologies for national priorities."

David Wang, a computer science doctoral student who has worked in the lab for several years, has focused his research on electronic health records. Working with several Washington area hospitals, he is designing ways to organize time-sensitive patient data to keep better track of patients who need repeat treatments or who could qualify for drug trials or other procedures based on their medical histories.

Wang said he's suddenly received a lot of interest from medical groups and companies wanting to learn more about analyzing health records, now that more than $19 billion in stimulus funding has been allocated to digitizing the information.

"I guess there's buzz wherever there's money," he said. His project is open-source, so others can take a look at his progress and the code he is using to design his tools.

Allison Druin, the lab's director, said the researchers have become much more focused on partnering with nonprofits, non-governmental organizations and federal agencies. She said she's received a number of questions about improving access to information, providing data on new platforms such as cellphones and getting a better handle on online threats to children, such as cyber-bullying. All these questions, she said, have a direct impact on government actions.

"A lot of what we do affects policy and, of course, the policy affects the way we use technology," she said.

Kim Hart writes about the Washington technology scene every Monday. Contact her at hartk@washpost.com.

View all comments that have been posted about this article.

© 2009 The Washington Post Company