washingtonpost.com
Search Me?
Google Wants to Digitize Every Book. Publishers Say Read the Fine Print First.

By Bob Thompson
Washington Post Staff Writer
Sunday, August 13, 2006

STANFORD, Calif. If it is really true that Google is going to digitize the roughly 9 million books in the libraries of Stanford University, then you can be sure that the folks who brought you the world's most ambitious search engine will come, in due time, for call number E169 D3.

Google workers will pull Lillian Dean's 1950 travelogue "This Is Our Land" -- the story of one family's "pleasant and soul-satisfying auto journey across our continent" -- from a shelf in the second-floor stacks of the Cecil H. Green Library. They will place the slim blue volume on a book cart, wheel it into a Google truck backed up to the library's loading dock and whisk it a few miles southeast to the Googleplex, the $100 billion-plus company's sprawling, campuslike headquarters in Mountain View. There, at an undisclosed location, it will be scanned and added to the ever-expanding universe of digitally searchable knowledge.

Why undisclosed?

Because for one thing, in their race to assemble the greatest digital library the world has ever seen, Google's engineers have developed sophisticated technology they'd prefer their competitors not see.

And for another, perhaps -- though Google executives don't say so directly -- the library scanning program already has generated a little too much heat.

Last fall, the Authors Guild and a group of major publishing houses filed separate suits in U.S. District Court in Manhattan, charging Google with copyright infringement on a massive scale. Google argues that under the "fair use" provisions of copyright law, it has a perfect right to let its users search the text of copyrighted works -- as long as, once the search is complete, it only shows them what it calls "snippets" of those works. Nonsense, say the authors and publishers: In order to find and display those snippets, Google must first copy whole books without permission.

Books like E169 D3 -- which finds itself smack at the heart of this contested legal territory.

"Great example," says Andrew Herkovic, the communications and development director for Stanford's libraries, as he pauses to consider "This Is Our Land" during a Green Library tour.

There's a 10-1 chance, Herkovic estimates, that its copyright expired without being renewed, which would put it safely in the public domain.

But "if you were the corporate counsel for Stanford, Google or anybody else, is 10 to 1 good enough?"

California Dreamin'

To travel to Silicon Valley and consider the fate of E169 D3 -- along with the tens of millions of other volumes Google hopes to scan, from Stanford and a number of other major libraries -- is to open a window on the future of books in the digital age.

It's also to be swept up in the saga of Google itself: the seat-of-the-pants enterprise that computer science whizzes Sergey Brin and Larry Page moved out of their cramped quarters at Stanford in 1998 -- just eight years ago! -- and into Susan Wojcicki's garage.

Silicon Valley cliche to the contrary, Wojcicki -- who was a friend of a friend of Brin's with a new house and mortgage worries -- says she rented them more than just a garage. "They had the garage and three bedrooms and two bathrooms," she recalls, confirming and clarifying the Legend of the Google Guys. "Yes, they stored stuff in the garage and they had servers and they had meetings. But it was winter, so it was actually kind of cold."

Laughing, she continues:

"The washing machine was in the garage, too. That was considered a key asset at the time."

Wojcicki is now the company's vice president for product management. As such, she's been involved in the book-scanning project for years. She's talking, on this blue-sky California day, in a small conference room crammed with colorful beanbag chairs. Outside, the lunchtime barbecue is over -- Google is famous for its perpetually free food -- and people zip from building to building on bright yellow motorized scooters.

Digitizing all the world's books "was an idea of Sergey and Larry's from very early on," Wojcicki says. In fact, they were supposed to be working on a small library digitization project "when they wound up creating a search engine, which today we know as Google."

Brin and Page tried to "monetize" their brainchild by peddling it to established Internet companies. When that didn't work, they switched to an advertising strategy -- but one that differed fundamentally from most Web advertising at the time. Rather than intrusive banner ads or pop-ups, the pair went with text-only advertising tied to the key words Google searchers typed in.

Worked like a charm. In the summer of 2004, Google went public and the Google Guys became instant multibillionaires. Google employees and investors (Stanford prominent among them) got a lot richer as well.

You'd think building a company that "may supplant Microsoft as the most important -- and most profitable -- corporation ever created" (as journalist John Battelle put it in his 2005 book "The Search") would have kept the pair busy enough. But no: According to Wojcicki, they never lost sight of their digital library dream.

She first heard them talk about it in early 2000, when "we didn't have the resources even to do our core business." But Brin and Page did more than just talk, even then:

"They actually would do some of the math behind it," Wojcicki says, "and calculate, like, how many machines it would take, how many hours it would take. So they knew with certain assumptions that it was a doable project."

It got more doable as the bucks started pouring in.

Brin and Page set a team of engineers to work on scanning technology. Later, they asked Wojcicki and her people to start acquiring books to scan. The first move was to negotiate with publishers for access to their current books.

Product manager Adam Smith explains how these deals work. With the publisher's permission, Google scans the full texts and makes the books searchable by key word. Users can't download a whole book, Smith says, but they can see sample pages -- "publishers can set a dial in terms of how much" -- and Google offers links to sites where the books can be purchased.

"The partner program is really an online marketing tool to help publishers," says content partnerships director Jim Gerber, who works with Smith and Wojcicki. Most major houses have signed on.

So far, so good.

But as Googlers will tell you, over and over, the goal of Google Book Search -- the current name for the overall scanning program -- is "to create a comprehensive, full-text searchable database of all the world's books."

Not some. All.

"We're Google. We like doing things at scale," as Wojcicki puts it.

Problem was, fewer than 5 percent of "all the world's books" were in print and available from Google's publishing partners.

So where were the rest of them going to come from?

'The Final Encyclopedia'

Sometime in 2002, Stanford's head librarian, Michael Keller, got an invitation to an exclusive gathering that would change his professional life.

The host was Microsoft billionaire Paul Allen. The location was Allen's place in the San Juan Islands, near Seattle, where a dozen or so high-level information technologists convened with an agenda that grew out of Allen's fascination with a science fiction novel called "The Final Encyclopedia."

"You know that novel? Gordon Dickson?" Keller asks. "It informed Paul's thinking. His question was: Are we near the point where we can have every piece of information, every fact, every record of every opinion and attitude, every bit of criticism, all the history of all the world's decisions and so forth . . . in one giant database?"

Google's Page had been invited to the San Juans, too. He and Keller talked. In September 2003, Keller and Herkovic drove down to Mountain View to hear a proposition from Page and some other Googlers.

"It was a very short conversation," Keller says. "Basically they said, 'What do you think about digitizing every book in the library?' And we said, 'Yay!' "

Stanford's librarian scarcely needed convincing about what digitization could do. After the university digitized its card catalogue, he says, use of the collection jumped 50 percent -- simply because books were easier to find. Another successful Stanford venture, HighWire Press, offers access to digitized scholarly journal articles.

Meanwhile, the library has been scanning books itself for decades. A few years ago, it bought a Swiss-made robotic scanner and set it to work in the Green basement. With 50 such robots, Keller calculated -- at a capital cost of something like $75 million -- the university could digitize its library all by itself. He got a few foundations interested, but they backed off.

Small wonder that when the Google offer came along, Keller jumped at it.

Not without a lot of due diligence, however, mostly about the legality of including books like call number E169 D3.

"Copyright 1950, Vantage Press, Inc. All Rights Reserved" reads the notice in "This Is Our Land" -- a clear enough warning at the time, but what does it mean, more than half a century later? The book is not in print, a fact that is easily ascertained. But does Vantage Press even still exist? Was the copyright ever renewed, and if so, who owns it now: the publisher, the author or the author's heirs? These questions are not so easily answered.

Most important, perhaps, even assuming Dean's book is still under copyright, would it be "fair use" for Google to copy it anyway, allowing it to be searched but making only "snippets" of text available for public view? (Fair use is a section of U.S. copyright law that allows portions of a work to be reproduced without permission under certain circumstances -- for example, in criticism, news reporting and scholarship.)

Keller asked Stanford's general counsel to help him consider this question. He consulted Stanford law professors and outside copyright experts, too. "We end up having a big seance," he says. "We get lots of opinions."

He makes no bones about what he was really after. Having his library included in Google's searchable database will be a fine thing, he says, but the real benefit to Stanford will come from the newly digitized copies of its own books that the university will receive from Google as a quid pro quo.

Keller starts ticking off the reasons they'll be so important. One is preservation. "We don't have enough invested in this country," he says, "to assure that printed materials are going to persist." Another is the potential for truly complex search. There are far more sophisticated ways than Google's key word approach through which the library can help its users mine data.

A fully digitized library, Keller enthuses, will be an unbelievable new intellectual resource: a "test bed" in which everyone from anthropologists to zoologists can experiment with varieties of research impossible to imagine before.

Why not go for it?

When Google announced the library scanning project, in December 2004, it had four library partners besides Stanford. Two of them (Oxford University and the New York Public Library) took a legally cautious approach to digitization, permitting Google to copy only public domain works. A third, the University of Michigan, took the opposite view, asserting forcefully that Google could scan every one of its 7 million books. Harvard hedged its bets, initially agreeing only to a limited test program. Last week, the University of California signed on as a sixth Google partner. Its scanning program will include both public domain and copyrighted material.

Stanford, despite Keller's enthusiasm, is still hedging a bit. The librarian believes that scanning even in-print books would be legal. For the time being, however -- because who knows when those lawsuits will be resolved -- only out-of-print material is getting trucked down to Mountain View.

"But you've got to hear me talk about those two suits," Keller says. "I can't wait for them to come up."

He proceeds to explain, vehemently and at some length, why Google's use of copyrighted work is "transformative" (part of the legal definition of fair use) and why search doesn't hurt the marketplace for a book (another fair use criterion).

"Transforming all the words in the book into a giant index is wrong somehow? Give me a break," he says. "And someone's going to get paid for that? Give me a bigger break."

But getting paid is what it comes down to, he thinks -- and the lawsuits are a way to force the issue.

"If you look at what the publishers are asking," Keller says, "I think they're trying to get Google to negotiate."

Permission, Permission

If Allan Adler were in the same room with Keller, he'd likely be saying: Of course publishers want to negotiate! The whole problem is that Google won't!

Instead, the vice president for legal and government affairs for the Association of American Publishers sits in the trade association's offices at the foot of Capitol Hill, shaking his head at what he sees as the breathtaking arrogance of it all.

"In order to provide online searchability," Adler says, Google has to create "a proprietary database that in essence would be the world's largest digital library." Extremely impressive, way cool -- and clearly of enormous value, or the company wouldn't be spending so much to do it.

From New York, Authors Guild Executive Director Paul Aiken echoes Adler's incredulity. "It's an attempt to avoid licensing," Aiken says. "Without the ability to say no, a rights holder really has nothing to license."

All together now: What part of "we own the copyright" doesn't Google understand?

The Googlers certainly seemed to understand it, Adler says, when they negotiated with publishers for the right to copy and search their in-print books. Both sides were happy with that part of the Book Search program, which Google announced in October 2004.

Just two months later, the company announced its library deals.

It took a while for the publishers to react. Individual houses talked to Google, but it wasn't until the spring that they got concerned enough collectively to ask their trade association to intervene. In July, at the AAP's New York headquarters, Adler and other publishing representatives met with Smith, Gerber and Google CEO Eric Schmidt. The focus, Adler says, was on what to do with the millions of noncurrent titles that are not yet in the public domain.

"We were essentially told, 'Look, this is a problem of scalability,' " Adler says. Google was going to be "backing up trucks" to collect books for scanning. How could it puzzle out copyright status book by book?

Three weeks after the meeting, Google surprised the publishers with a unilateral move. The company had always said it would respect an author or publisher's request to "opt out" of the Book Search program after a book was scanned. Now it would accept opt-out requests in advance. To facilitate this, it declared a three-month scanning moratorium.

No, no, no, said the publishers. We should be in control here: You need us to opt in .

On Sept. 20, 2005, the Authors Guild filed a class action suit against Google, seeking statutory damages and an injunction to halt the scanning. A month later, five major publishers -- McGraw-Hill, Pearson Education, Penguin Group (USA), Simon & Schuster and John Wiley & Sons -- sued as well, with the support of the AAP. The publishers didn't ask for damages because they didn't want the focus to be on money.

Permission, permission is their refrain.

Listen long enough to both sides in this dispute and your head will spin with legal citations and passionate argument. But it's possible to isolate key points of contention. Among them:

· Copyright and fair use: As Google's Gerber puts it, the two sides obviously have a "fundamental difference about what is required to build an index of information." Because whole books or even whole pages are not displayed, Gerber and his colleagues argue, making copyrighted books searchable is the kind of "transformative use" permitted under copyright law. The publishers and the Authors Guild completely disagree, arguing that Google's unlicensed creation and retention of digital copies -- as well as its creation of additional copies for the libraries -- are illegal.

· Money and motivation: "Google would like the world to see this as a purely altruistic act on its part," says the AAP's Adler. Instead, he argues, searchable books are part of the company's "very brilliant economic strategy" for differentiating itself from competitive search engines. If you're worried that Yahoo, Microsoft or some unknown startup will scoop up lucrative market share, adding books to your database helps you stay ahead.

Google executives downplay this analysis but don't deny it. "The reason we're doing it," Wojcicki says, is that "making Google more comprehensive will yield a better search experience." Yes, that should lead -- eventually -- to more users and more revenue. But Book Search, she cautions, also represents a huge outlay of capital and isn't guaranteed to pay off anytime soon. It's a risk, as Gerber points out, you don't see publishers lining up to take.

· The Web search analogy: This gets a bit complicated, but it's crucial to understanding the dispute over Google's library scanning. Wojcicki, Smith, Gerber and Google attorney Alexander Macgillivray -- whom Smith calls "our thought leader" on intellectual property issues -- all insist that there's very little difference between the basic functioning of their Web search engine and Book Search.

The comparison goes like this:

To index the Web, Google first sends out software programs called "crawlers" that explore the online universe, link by link, making copies of every site they find -- just as Book Search makes a digital copy of every book it can lay its hands on. Web sites are protected by copyright, so if you don't want your site indexed by Google and its search brethren, you can "opt out," usually by employing a nifty technological watchdog (a file called robots.txt) that tells search engines to bug off.

Ditto for books, Google argues: Publishers and authors can opt out by informing Google that they don't want their books scanned and made searchable.

The analogy carries a risk for Google. Former Wired editor Kevin Kelly, one of the most influential journalists covering the digital revolution, sums it up this way: "If they capitulate on this with the publishers, they jeopardize their entire ability to search the Web."

Google executives don't sound worried. "No judge is going to rule that Web search is illegal," Macgillivray says. Still, they're on the horns of a dilemma. To use the Web analogy in court is on some level to bet the company, however favorable the odds.

No need to fret, say the publishers: The analogy fails in any case.

Most Web sites, they point out, are designed to be free. Books are not. As for the "opt out" requirement, as one high-ranking publishing executive explains it -- he doesn't want to be named; odds are he'll be dealing with Google in the future -- publishing houses have already installed a perfectly good, low-tech version of robots.txt.

"It's called a price," he says.

'Don't Be Evil'

Five years ago, Google's head of human resources rounded up a dozen or so early employees and asked them to try to identify the company's core values. As Battelle reports in "The Search," instead of the usual mush of corporate platitudes, a striking three-word slogan emerged: "Don't be evil."

As company mottos go, it was succinct, distinctive -- and just a tiny bit hard to live up to.

Eight years after Page and Brin incorporated Google and took over Wojcicki's garage, the company still retains some of its don't-be-evil halo. It offers a wonderfully efficient, free tool now used by countless millions around the globe. It does many things its own way, and a lot of them seem admirable: When it went public, for example, it insisted on a process that would circumvent Wall Street's usual insider cronyism and make Google stock equally available to anyone who could afford five shares.

But when you're suddenly richer than John D. Rockefeller and operating on a scale that invites Microsoft comparisons, can a backlash be far behind?

Both the APA's Adler and Kelly, the digital journalist, think it's already here. They cite, among other things, Google's morally questionable decision to abide by political restrictions placed on it by the Chinese government; the American public's dismay when it discovered just how much of its private online behavior gets filed away in Google computers; and the usual human reaction, as Kelly puts it, "to large success of every type."

Fair use or not, this might not be the ideal time for Google to claim the right to digitize every single book in the world.

The publishers' and authors' lawsuits are in the discovery phase, which likely will drag on for months. It's not clear when the court will hear the merits of the fair use argument; Adler's best guess is the spring or summer of 2007.

Unless the two sides end up negotiating after all.

And here's where the outcome of this legal battle and the future of the book may begin to merge.

Everyone involved agrees that search helps people discover books they want. Everyone also agrees that in an ideal world, once those books are found, there'd be a quick way for the finders to pay to access the actual text -- all of it or just part of it, whatever they need.

Under Google's copy-everything-without-permission plan, easy access to anything but "snippets" is denied for most copyrighted books. But with the right deal in place, copyright holders would get paid and Google could make Book Search a whole lot more useful.

When you ask Google executives directly whether they plan to offer some kind of print-on-demand service -- as Amazon.com, for instance, with publishers' permission, already does -- they can get a bit coy. "We don't really speculate about the future," Smith says, just minutes after he's noted -- in response to a more general question -- that "one of the interesting technologies to keep an eye on is print on demand."

But that's the future. Right now, five days a week, the Googlers are still backing trucks up to that Stanford loading dock.

It's anybody's guess when they'll get to the shelf where call number E169 D3 resides.

View all comments that have been posted about this article.

© 2006 The Washington Post Company