Search strings including "xls," or "cc," or "ssn" often brings up spread sheets, credit card numbers, and Social Security numbers linked to a customer list. Adding the word "total" in searches often pulls up financial spreadsheets totaling dollar figures. A hacker with enough time and experience recognizing sensitive content can find an alarming amount of supposedly private information.
"On a [client's] bank site, I found an Excel spread sheet with 10,000 Social Security and credit card numbers," said Skoudis, of one of his successful treasure hunts.
The bank's Web server had been properly configured to keep such documents private, but someone had mistakenly put the information on the wrong side of the fence, he said. "Google found the open door and crawled in."
Skoudis confronted the "red-faced executives" with his findings, he said, and was told: "Just fix it, damn it."
Google and other search-engine operators are unable to gauge how frequently private documents are accessed using their sites, or how many are removed for security reasons.
"The challenge is that as the search-engine tool evolved, people got more lax about what they put on a publicly available Web server," said Tom Wilde, vice president and general manager of Terra Lycos's 19 search engines. "It would be impossible to monitor" the tens of millions of searches that take place every day, Wilde said, adding that he has never been notified of a security breach on his sites.
Government officials said they were familiar with Google hacking, and were working with government agencies and businesses to secure sensitive documents on Web servers.
"It's an issue we're aware of and tracking," said Amit Yoran, director of the cybersecurity division of the Homeland Security Department. By law, each agency is responsible for its own security, and although hacking or security breaches are reported to Homeland Security, the cybersecurity division does not monitor the content of the Web, he said.
It is unclear who is at fault when someone digs up a confidential document.
"I don't know what law's been violated just for searching" on a publicly available search engine, said Paul Bresson, a spokesman for the FBI, noting the bureau has not yet taken actions against individuals who have found secure documents by using search engines. "If they use it for some sinister purpose, that's another issue."
The availability of private information contributes to rising incidence of identity theft, which for the last four years has been the No. 1 consumer problem for the Federal Trade Commission. Last year the FTC received nearly 215,000 complaints about identity theft, up from about 152,000 in 2002.
Since 2001, the FTC has settled cases with Eli Lilly & Co., Microsoft Corp. and clothing maker Guess Inc. for not taking "reasonable" measures to keep medical or financial information secure, said Jessica Rich, assistant director of the commission's bureau of consumer protection. Letting customer information reside on an unsecure server can open up a business to such liability.
"There are unique vulnerabilities because of databases that are accessible through the Web," Rich said, adding that the FTC anticipates bringing more security-related cases in the future.
Once confidential pages are found, it is not easy to get them back under wraps.
Even after a document has been pulled off of a Web server, as was the case when MTV removed from its Web site a pre-Super Bowl press release promising "shocking moments" at the halftime show, documents often remain cached, or stored, in other search engines' computers so they can still be accessed.
"Once it is placed online, it's very hard to get the digital horse back in the electronic barn," said Marc Rotenberg, executive director of the Electronic Privacy Information Center. "It's close to impossible to get it back."