How 'content farms' beat Google, and what search engines should do about it

By Rob Pegoraro
Washington Post Staff Writer
Saturday, January 29, 2011; 6:27 PM

Google can give you free long-distance calling and provide driving, walking, transit or bicycling directions to almost anywhere in the world. But can it find information on the Web when you ask?

The Mountain View, Calif., company has made an unusual confession: It's having some trouble with its original and primary task.

As my colleague Michael Rosenwald writes, Google's reputation for uncanny accuracy has been dulled by "content farm" sites that game its search system to boost the visibility of pages many readers say they don't want.

Google search engineer Matt Cutts's Jan. 21 blog post acknowledged their complaints: "we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content."

It's easiest to see this problem if you search for a review of a product or instructions on how to do something and find yourself looking at dozens of irrelevant results that don't answer your question or that rip off another site's work.

There's not much point in getting too mad at these sites. They're simply following a prime directive of the commercial Web: Get people looking at your site, then use advertising - often placed through Google's services - to transmute that traffic into money.

(We in the media play by these rules, too. Building Web traffic through "search engine optimization" has become a major part of a journalist's job .)

Meanwhile, plenty of writers, photographers, videographers and editors are willing to accept minimal per-product payments to crank out a large volume of posts that match up with common Google searches.

As one result of this dynamic, Yahoo paid $100 million for a content mill named Associated Content in May. The best-known company in this category, Demand Media, staged its initial public offering Wednesday and closed the day with a higher market value than the New York Times Co.

Both sites would dispute the content-farm categorization, and I've seen each publish useful information. But I've also read plenty of dreck at these sites, and there's no disputing their business model of mass-producing content to fit, Lego-like, with search terms.

In essence, Google has unintentionally been teaching to the test - and some of its students have learned all too well.

As Rosenwald's article suggests, this opens up an opportunity for social-networking sites that connect people with trusted, knowledgable friends to beat Google at its own game. But by requiring your identity to work, they incorporate privacy and security risks.

It will not be a healthy development for the Web if finding useful data online requires a username and password.

Google competitors such as Microsoft's Bing and smaller rivals such as the community-curated Blekko or the data-oriented Wolfram Alpha have an excellent opportunity to do a better job of connecting people to information they need - not just pages that try to look useful to a search engine's automated indexing system.

But Google will have to become a little pickier too, as Cutts's blog post suggests it will. And as that happens, Google may run into a second problem: "search neutrality."

The term surfaced a few years ago, in part because opponents of net-neutrality regulations began talking up that angle. (In retrospect, that PR tactic looks like a clever exploitation of the media's weakness for he-said/she-said stories that look "balanced" by quoting each side attacking the other.)

The notion carries a lot of built-in ridiculousness. Web search is inherently an editorial act - and not an easy one, either. You're asking a site to sift through about a trillion pages and find the ones most relevant in less than a second. The whole point of search - as in journalism - is to exercise bias against things judged to be less relevant.

But when adjusting search algorithms to devalue mass-produced, low-quality content can knock the legs out from underneath a content farm's business model, you don't need a search engine to predict that calls for "search neutrality" will increase.

Google and other search sites can insulate themselves from some of this pressure by being more transparent about how they revise their search systems.

Google in particular should take one extra step: Allow users of Android phones to change the default search engine in its mobile operating system as easily as they can in Google's Chrome browser.

But transparency and openness - not to mention the fact that nobody is being forced to use Google - may not be enough to fend off calls for regulation. The European Union's antitrust regulators launched an investigation of Google last year and have begun asking sites whether they think Google manipulates search results. Google thinks enough of this possibility that it sent Cutts to Washington two weeks ago to talk to policymakers and journalists, myself included, about these issues.

He and Google don't have an easy task in store. They're going to need more than a content farm's how-to write-up to get through it.

© 2011 The Washington Post Company