Of 11 major Internet search engines, Northern Light, Snap and AltaVista indexed the largest number of pages, a new survey has found, but it warns that all search engines are falling behind in efforts to keep up with the Web's exponential growth. No site indexed more than 16 percent of the Web.
The 800 million accessible Web pages were dominated by business interests, the survey concluded. Commercial material was found on 83 percent of the sites, followed by education and science, 6 percent; health, 2.8 percent; personal, 2.3 percent; pornography, 1.5 percent; community, 1.4 percent; government, 1.2 percent, and religion, 0.8 percent. The survey did not address, however, questions of how many people visit the different types of sites.
There's no one "best" search engine out there, concluded researchers Steve Lawrence and C. Lee Giles at the NEC Research Institute in Princeton, N.J. The results achieved by 11 major Web engines are quite different because the amount of technology--and money-- invested varies so much, Lawrence and Giles report in the July 8 issue of Nature (www.nature.com). Some engines set time limits on searches. Some have software that does better at ranking the relevance of search findings.
Search engines capable of sorting through the pages to answer a user's query are vital to the success of the online experience, noted the authors.
In their test last February, they submitted 1,050 identical queries to the 11 engines, determining each one's "hits" as a percentage of the total links recorded by all engines.
Northern Light led the way with 38.3 percent coverage. But it also had the second- highest rate of invalid links--links to pages that didn't exist anymore. HotBot, which ranged fourth in coverage, had the fewest invalid links, followed by Microsoft and Excite.
Another performance benchmark was a site's freshness, which the authors tested by finding the time each engine took to index changes to a selection of existing pages. The most up-to-date search engines were AltaVista, Excite and HotBot.
Because none of the search engines is keeping up with Web growth, multiple searches with several engines (or a "metasearch" engine, such as WebCrawler, that queries several engines at once), will pay dividends, they said.
They also note that some engines, like Google and DirectHit, use popularity measures in ranking the relevance of pages produced in a Web search. (DirectHit, for example, uses the number of times links have been selected in previous searches to rank pages).
"We can see a cycle where popular pages become more popular, while new, unlinked pages have an increasingly difficult time becoming visible in search-engine listings," the authors say.
Search Engines' Coverage
Percent of each engine's hits compared with the combined coverage of all engines:
SOURCE: NEC Research Institute