AOL Search Queries Open Window Onto Users' Worlds
Thursday, August 17, 2006
Out of more than 36 million search queries that hundreds of thousands of AOL users typed into AOL's Internet search engine from March to May, here is the term most queried: Google.
That so many customers would use one search engine to find another is among the odd truths being mined from AOL's public release of search data. The company last week called the incident involving 658,000 users' queries a "screw-up" and apologized. But for better or worse, the data offer the first widespread public glimpse of how people search the Internet, of what they are interested in. Of how people think.
In just a week, the breach has spawned a cottage industry of Web sites and online commentary devoted to analyzing and parsing the data, which include Social Security numbers and potentially embarrassing searches, such as "bad breath could it be an infection in one of my teeth."
While acknowledging concerns about privacy, researchers said it is an opportunity to study how people search for information in a limitless universe of data.
Web sites have devoted themselves to combing through the information. There's http:/
SEOSleuth.com shows which Web sites were most visited by those AOL searchers: Google and MySpace were tops. There's even a site in German, Sistrix.com, that looks at search-term frequency.
Even privacy advocates who were outraged by the breach have analyzed the search strings, mainly so they can provide evidence to back up their claims about how invasive the data are. Even though AOL assigned random ID numbers to each user, some search strings provide enough clues that anyone with access to databases of phone or Social Security numbers or addresses could try to link that data to a person.
A Washington Post analysis turned up at least 190 searches in the data set that appeared to contain a Social Security number and at least several thousand that contained possible telephone numbers.
JoAnn Whitman, a 55-year-old retired grocery store worker from Grand Junction, Colo., accidentally typed an order confirmation from Bed, Bath & Beyond into the AOL search engine on May 3. The entry included her name and address. Contacted by The Washington Post, she expressed dismay.
"They say, 'Oh, we'll protect it, but it's not secure,' " she said of the data. "I don't think that it's anybody else's business."
She said that she had not heard of the AOL data disclosure and that she was thankful there was nothing really embarrassing in her searches, which included queries to "www.mervynsboys shoes .com" and "www.Wellfargobank.com."
Paul Boutin, a technology columnist for the online magazine Slate, owned by The Washington Post Co., has created his own user typology with the data. In an article titled "You Are What You Search," he grouped users into seven categories, including the Pornhound, who shifts from "poems about a red rose" before midnight to "sexy dogs and hot girls" a half-hour later; the Newbie, including folks who type in http:/