By Leslie Walker
Washington Post Staff Writer
Thursday, October 28, 1999; Page E01
Imagine your boss giving you this assignment: Organize the billion pages on the World Wide Web into a system that will allow anyone to find anything within seconds. Bear in mind that as fast as you can devise your indexing system, the volume of pages likely will double and their contents will change.
Wouldn't you say, "What kind of fool do you think I am?"
Not Sergey Brin and Srinija Srinivasan. They and a small group of other young people see organizing the Internet as a fun puzzle instead of a nightmare akin to Sisyphus rolling his heavy rock uphill and watching it fall back down.
Both under 30, these former Stanford graduate students take contrasting strategies as they lead two of the Internet's popular organizing schemes: Yahoo Inc.'s directory and a search engine called Google. One relies on human intelligence (Yahoo's directory is compiled by several hundred human editors) and the other on machine smarts (Google uses computers to analyze a site's value, based on how other sites link to it).
No one knows who has the best shot, but as with the mythical Sisyphus, what's interesting is how each approaches a seemingly insurmountable task. They are skeptical of each other's work--disagreeing, basically, about whether computers can ever match the intuitive power of the human brain. One assumes responsibility for deciding what's important; the other is training computers to figure that out.
Listen to Srinivasan, 28, who has been organizing Yahoo's directory since it moved out of her friend Jerry Yang's dorm room in 1995: "We have a finite set of resources for a potentially infinite problem. The thing I lose sleep over is how can we make sure we are making the right decisions about where to spend our time."
Or Brin, 26, a University of Maryland computer science graduate who was three months shy of his PhD at Stanford last winter when he and a pal launched Google: "At our last analysis in September, we analyzed 400 million Web pages and 3 billion links on those pages. We are trying to bring something better to the world and make it easier for people to find information."
While directories organize sites into categories that can be browsed, search engines compile a gigantic index and scan it for relevant sites every time someone makes a query. Increasingly, major Web portals use both strategies, combining directories with a search engine that kicks in when users can't find what they want in a directory.
Some experts believe both strategies fall short. Yahoo's directory, for instance, has become so cluttered with categories and subcategories that users often click through a maze before reaching an actual Web site. On the other hand, the clutter masks an equally big problem: A rapidly growing number of sites don't even make it into Yahoo.
Yahoo does not reveal its size, but competitors estimate the directory lists between 2 million and 4 million sites. Analysts believe the Web currently has nearly 10 million sites containing more than 1 billion pages (a research project counted 800 million pages early this year). Srinivasan's team of surfers adds 1,000 to 2,000 sites to Yahoo each day, while thousands more are submitted daily for evaluation. To help electronic-commerce sites deal with the backlog, Yahoo gives expedited reviews for a fee.
The widening gap between what's on the Web and what's retrievable does not worry Srinivasan, who gave up long ago thinking her team could catalogue it all. She likens her mission to that of a news team: Since it can't cover many more stories than it has people, it becomes an editorial challenge to decide what's "news" or what matters most to Web users worldwide. Think of it as the Web's equivalent of the 6 o'clock news, a new kind of value filter. To make their calls, Yahoo editors scour not only traditional media but also their own search logs, where terms like "MP3" or a surprise movie hit such as "The Blair Witch Project" might jump to the top.
"It's not rocket science," she says. "It's more of a gut feeling--an art, an editorial judgment about what are we going to get to today."
Google's Brin doubts human-compiled directories can continue tracking the Web in a meaningful way, considering that even powerful search engines that "crawl" Web pages by hopping from hyperlink to hyperlink are falling behind. A study by the NEC Research Institute early this year found that no search engine was indexing more than 16 percent of the Web.
If they can't be as big as the Web, maybe they can be smarter. That's the theory behind Google and other second-generation search services. Google's secret formula uses algorithms to assess the importance of a site based on the volume and "authority" of other sites linking to it. If a site is good, the thinking goes, many other Web sites will link to it. Google weighs the placement of links on each page, along with font sizes and capitalization. It has developed a loyal following among researchers for its knack of displaying highly relevant sites high in search lists.
While Yahoo has built a booming media business on the back of its directory, adding content and services to help it rake in hundreds of millions of dollars in advertising and commerce revenue, Google's business plan is still in the toddler stage. It debuted a clever ad system this month that displays book titles for sale from Amazon.com at the top of search return pages. The book titles, supposedly relating to queries people entered, are clearly marked as advertising. Google hopes to be able to match consumer searches with many other kinds of products from advertisers' inventories.
Yahoo's directory chief thinks Google is headed in the right direction, because its link analysis reflects human decisions. "Everyone who created a link made a choice to make a link," she notes. Yet Srinivasan does not believe artificial intelligence can ever truly match the unique ability of humans to draw fine distinctions and assess quality: "I haven't seen anything that even approaches 'This is good, and this is not.' "
Brin, on the other hand, believes his strategy will allow Google to keep pace with the Web because it will have more links to analyze as the global network expands. "That's our competitive advantage--we get smarter, not worse, as the Web gets bigger."
You have to admire his hubris. Hers too. Personally, I hope the Web doesn't become an unmanageable mass of data that renders their work irrelevant, like the boulder whose weight undid Sisyphus.
Eventually, we may all need protection from the data overload of the Web. Both Yahoo and Google seem well on the way to reversing the original goal of Web searching from finding all relevant documents to pinpointing a few good ones.
© 1999 The Washington Post Company