washingtonpost.com
Google's Goal: A Worldwide Web of Books

By Leslie Walker
Washington Post Staff Writer
Thursday, May 18, 2006

It's odd to hear Vinton Cerf, regarded as one of the founding fathers of the Internet, to gush over ink-on-paper books.

The electronic pioneer and computer scientist, who now works as Google's chief Internet evangelist, is also a bibliophile who has a collection of about 10,000 hard-copy volumes lining shelves at his home in McLean.

These days, Cerf is busy promoting Google's plan to marry his two passions -- books and the Internet -- by digitizing millions of library books. He recently dropped by my office to explain the controversial plan and talk about its implications for book lovers.

As Cerf talked about his personal book collection and the limitations of having knowledge fixed on paper, he got me thinking about how reading will be transformed when static libraries join the more dynamic world of cross-referenced knowledge on the Web.

For starters, Cerf said, libraries are not exactly easy to navigate.

"Think for a moment about the dead-tree problem," he said. "When you stand in your own personal library looking for something and you realize that A, you can't remember which book it was in, and B, there's no way you can go through manually looking at all the pages, then you think, 'God, I wish all this stuff was online.' "

That's the stated goal of Google's library project, to create a massive electronic card catalog that will help people find information in published books, much as Google already does with Web pages.

Google has vowed to create a full-text index of seven-million books in the University of Michigan library, along with millions more in the university libraries at Harvard, Stanford and Oxford, as well as the New York Public Library. The idea is similar to Amazon.com's "search inside the book" feature, eventually allowing anyone using Google's free book search ( http://books.google.com/ ) not only to see sample pages from books but also search their contents and find excerpts matching search terms.

Google is not alone in trying to digitize library books. Yahoo, Microsoft and other Internet players have joined a collaborative effort called the Open Content Alliance, which is planning to digitize not only library books but other types of multimedia, as well, making them all accessible on the Web.

Google, however, has embarked on a solo book project that is much further along than the collaborative effort. The Internet search leader has developed technology for bulk scanning of books and started scanning them at the University of Michigan, much to the consternation of the publishing industry. The Authors Guild and a group of publishers have accused the search giant of copyright infringement in two lawsuits filed last fall.

Several of those same publishers were -- and still are -- Google's partners in a program announced in the fall of 2004 to scan in-print books provided by publishers. That plan called for making books searchable online and sharing with publishers the revenue Google gets from showing text ads alongside book search results.

But publishers cried foul a few months later when Google announced it was expanding its book search to include millions of library tomes. Unlike the initial plan, the library project involves scanning many books that are either clearly under copyright or for which the copyright status is unclear.

Google contends the project falls within the "fair use" exemption to copyright law, because it is not providing full access to copyrighted books, merely letting people search inside and see excerpts. Google's book search service, still in trial mode, allows people to only read the full text of books in the public domain and shows sample pages from books for which publishers have granted Google sampling rights.

But for most books it is scanning, Google argues the copyright status is unclear and therefore shows more limited excerpts. Google refers to them as "snippets," raggedy images of a few lines of text from inside, with information about who published each book and when.

But at least five publishing houses disagree that the "snippets" constitute fair use. In a lawsuit filed in October, McGraw-Hill, Simon & Schuster and three other publishers charged that Google is violating copyright law because, in order to prepare the snippets, it is making and storing on computers unauthorized full copies of their books. And while Google tells the public its goal is simply to make books searchable, the suit alleges that Google's aim is to get more visitors so it can sell more ads.

"The question you have to ask is whether book search is an asset to Google," said Allan Adler, vice president for legal and government affairs for the Association of American Publishers. "Of course it is. It's one way it can differentiate itself from the competition."

Cerf thinks publishers fail to appreciate that Google probably will help them sell more books by making them searchable. Helping people locate a book and know what's in it, he said, are key steps toward getting them to buy it. And for many books are available for sale, Google provides links to Amazon.com and other online sellers. Google does not sell books.

For now, Google is showing no ads alongside search results involving books from libraries, only books provided by publishers. In those cases, publishers are receiving a share of the ad revenue.

Google also recently announced it will soon allow publishers and copyright holders to sell full electronic access to books through Google book search, either by letting people read the text online or downloading copies. Google will take a 30 percent commission on any fees publishers collect.

What Google has not announced, but is likely to one day, are ways it might help publishers and authors enhance pages from printed books once they are online.

Cerf refers to this as "books that talk to each other," an idea to make them more like the rest of the Web where pages are cross-linked and visitors can annotate and tag text as is done with Web logs.

"Because the Internet is a computing environment, a software environment, it's possible to create a much richer kind of information than what we are typically accustomed to in books," Cerf said. Digitized books, he said, can be searched and updated easily, linked to related material, and enhanced with audio and video. But they can also be changed, which means that the book you read a year ago may look different the next time you consult it.

As his attention turned back to his personal book collection, his eyes lit up as he imagined searching its contents from a BlackBerry. Listening to him, I couldn't help thinking how inevitable it is that library books will move online and come alive with hyperlinks and annotations, the way the Web already is.

And then everyone, not just the Vinton Cerfs of the world, will have access to vast personal libraries from the comforts of home.

Leslie Walker welcomes e-mail atleslie@lesliewalker.com.

View all comments that have been posted about this article.

© 2006 The Washington Post Company