Publishers Seeking Web Controls

Tom Curley of the Associated Press said Automated Content Access Protocol will protect AP's news reports from being distributed without permission.
Tom Curley of the Associated Press said Automated Content Access Protocol will protect AP's news reports from being distributed without permission. (By Lawrence Jackson -- Associated Press)
By Anick Jesdanun
Associated Press
Friday, November 30, 2007

The desire for greater control over how search engines index and display Web sites is driving an effort launched yesterday by leading news organizations and other publishers to revise a 13-year-old technology for restricting access.

Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as robots.txt, which a search engine's indexing software, called a crawler, knows to look for on a site.

But as search engines expanded to offer services for displaying news and scanning printed books, news organizations and book publishers began to complain.

News publishers said that Google was posting their news summaries, headlines and photos without permission. Google claimed that "fair use" provisions of copyright laws applied, though it eventually settled a lawsuit with Agence France-Presse and agreed to pay the Associated Press without a lawsuit filed. Financial terms haven't been disclosed.

The proposed extensions, known as Automated Content Access Protocol, partly grew out of those disputes. Leading the ACAP effort were groups representing publishers of newspapers, magazines, online databases, books and journals. The AP is one of dozens of organizations that have joined ACAP.

The new rules allow a site to block indexing of individual Web pages, specific directories or the entire site, though some search engines have added their own commands.

The proposal, unveiled by a consortium of publishers at the global headquarters of the AP, seeks to have those extra commands -- and more -- apply across the board. Sites could try to limit how long search engines may retain copies in their indexes, for instance, or tell the crawler not to follow any of the links that appear within a Web page.

"ACAP was born, in part at least, against a growing backdrop of mistrust," said Gavin O'Reilly, president of the World Association of Newspapers.

The current system doesn't give sites "enough flexibility to express our terms and conditions on access and use of content," said Angela Mills Wade, executive director of the European Publishers Council, one of the organizations behind the proposal. "That is not surprising. It was invented in the 1990s and things move on."

Tom Curley, the AP's chief executive, said the news cooperative spends hundreds of millions of dollars annually covering the world, and that its employees risk often their lives doing so. Technologies such as ACAP, he said, are important to protect AP's original news reports from sites that distribute them without permission.

"The free riding deprives AP of economic returns on its investments," he said.

Jessica Powell, a spokesman for Google, said the company supported all efforts to bring Web sites and search engines together but needed to evaluate ACAP to ensure it can meet the needs of millions of Web sites, not just those of a single community.

"Before you go and take something entirely on board, you need to make sure it works for everyone," Powell said.

© 2007 The Washington Post Company