Google's Goal: A Worldwide Web of Books
Thursday, May 18, 2006
It's odd to hear Vinton Cerf, regarded as one of the founding fathers of the Internet, to gush over ink-on-paper books.
The electronic pioneer and computer scientist, who now works as Google's chief Internet evangelist, is also a bibliophile who has a collection of about 10,000 hard-copy volumes lining shelves at his home in McLean.
These days, Cerf is busy promoting Google's plan to marry his two passions -- books and the Internet -- by digitizing millions of library books. He recently dropped by my office to explain the controversial plan and talk about its implications for book lovers.
As Cerf talked about his personal book collection and the limitations of having knowledge fixed on paper, he got me thinking about how reading will be transformed when static libraries join the more dynamic world of cross-referenced knowledge on the Web.
For starters, Cerf said, libraries are not exactly easy to navigate.
"Think for a moment about the dead-tree problem," he said. "When you stand in your own personal library looking for something and you realize that A, you can't remember which book it was in, and B, there's no way you can go through manually looking at all the pages, then you think, 'God, I wish all this stuff was online.' "
That's the stated goal of Google's library project, to create a massive electronic card catalog that will help people find information in published books, much as Google already does with Web pages.
Google has vowed to create a full-text index of seven-million books in the University of Michigan library, along with millions more in the university libraries at Harvard, Stanford and Oxford, as well as the New York Public Library. The idea is similar to Amazon.com's "search inside the book" feature, eventually allowing anyone using Google's free book search ( http:/
Google is not alone in trying to digitize library books. Yahoo, Microsoft and other Internet players have joined a collaborative effort called the Open Content Alliance, which is planning to digitize not only library books but other types of multimedia, as well, making them all accessible on the Web.
Google, however, has embarked on a solo book project that is much further along than the collaborative effort. The Internet search leader has developed technology for bulk scanning of books and started scanning them at the University of Michigan, much to the consternation of the publishing industry. The Authors Guild and a group of publishers have accused the search giant of copyright infringement in two lawsuits filed last fall.
Several of those same publishers were -- and still are -- Google's partners in a program announced in the fall of 2004 to scan in-print books provided by publishers. That plan called for making books searchable online and sharing with publishers the revenue Google gets from showing text ads alongside book search results.
But publishers cried foul a few months later when Google announced it was expanding its book search to include millions of library tomes. Unlike the initial plan, the library project involves scanning many books that are either clearly under copyright or for which the copyright status is unclear.