Amazon's Vital Statistics Show How Books Stack Up

By Linton Weeks
Washington Post Staff Writer
Tuesday, August 30, 2005

Hey! Remember books? Sure you do. Your parents read them. So did their parents. Now for all those folks who like to talk about books without actually reading them, some exciting news! Amazon.com, the pack-leading online superstore, has figured out an innovative, and some would say insidious, way to talk about books.

Text Stats.

It's part of the company's ingenious "Search Inside" capability, which allows you to comb through the entire text of a book online.

Sure, "Search Inside" has been around for a couple of years and, by looking at the first few pages of a book that uses the feature, you can get to know a little about it before you buy.

But with the addition in April of Text Stats, "Search Inside" now takes books completely apart. It slices! It dices! It can uncomplicate comedies, trivialize tragedies, diminish legitimate discourse and completely humiliate the humanities!

Through Text Stats you can know such arcane things as the SIPs, or Statistically Improbable Phrases, that appear in a book. The strange pairing "reindeer socks," for instance, shows up four times in Eric Jerome Dickey's "Naughty or Nice." Text Stats will also tell you the number of characters (letters, not protagonists) in a book and the relative "complexity" of the words. It ranks works according to difficulty. Three different computer-driven indexes suggest whether a book is easier -- or harder -- to read than others.

And, thanks to a sensational subsection called Fun Stats, you will know just how many words you are getting per dollar and per ounce with each book. For instance, "War & Peace" by Leo Tolstoy gives you 51,707 words per dollar, while "Obliviously On He Sails: The Bush Administration in Rhyme" by Calvin Trillin delivers only 1,106 words per dollar.

Confronted with the evidence, Trillin says, "I don't mind being compared to Tolstoy literarily, but when it comes to Fun Stats it's a little humiliating."

Trillin also says he was surprised to learn that of the words he used in his book, 10 percent of them were deemed "complex."

"I thought I was hitting around 6 or 7 percent," he says. "That's what I usually aim for."

Virginia author Robert Bausch is also concerned about his statistics. "There is something really kind of disturbing about this," he says while checking the Text Stats of "The Gypsy Man," his novel from 2003. "You can't tell me that 96 percent of books have fewer sentences than mine. If that's true, I'm in [expletive] deep [expletive]."

Kristin Mariani of Amazon.com says the "Search Inside" features were designed to make it "even easier for customers to find, discover and buy books they'll love." She adds that the program has helped increase sales.

How, you may be wondering, will Text Stats enhance your life? Say you must choose between two best-selling novels: "The Kite Runner" by Khaled Hosseini and "The Curious Incident of the Dog in the Night-Time" by Mark Haddon. Both are in paperback; both sell for about $10.

"Kite Runner" is 384 pages long; "Curious Incident" is 240. On Amazon.com, you can read the publisher's synopses and the editorial and reader reviews -- helpful tools for making an intelligent choice.

But now, with Text Stats, you can reduce your decision to utter absurdity. You can learn, for example, which book scores higher on what is known as the Fog Index. Conceived by the late Robert Gunning, an English professor at Oxford University, the index states the number of years of formal education you should have in order to read and comprehend a random passage.

Here's how the index works: It picks a sample -- of 120 words or so -- from the text. It finds the average number of words per sentence, then picks up all the words in the sample that contain three or more syllables. Compound words are ignored; so are verbs that become polysyllabic through tense endings. The first word of each sentence is tossed out, so are proper nouns. Then it takes the polysyllable count, adds it to the average number of words in a sentence, multiplies that number by 0.4 and, voila! The answer is a number that supposedly represents a comprehension grade level.

The Fog Index shows that you should read at a seventh-grade level to digest "Kite Runner" and at ninth-grade level for "Curious Incident." The first novel contains 6 percent complex words, meaning three or more syllables. Five percent of the words in "Curious Incident" are considered complex. There is an average of 1.4 syllables per word in both books.

At 11,702 words per dollar, "Kite Runner" is obviously a better bargain than "Curious Incident," which contains only 6,156 WPD. That is, unless the words in "Curious Incident" are more meaningful, poetic or carefully chosen.

Text Stats still has a few kinks in its system. According to the Fog Index, a simple child's book such as "The Runaway Bunny" requires seventh-grade reading proficiency. And James Joyce's "Ulysses" is said to be easier than 80 percent of other indexed books.

But in its pure form, Text Stats is a triumph of trivialization. By squeezing all the life and loveliness out of poetry and prose, the computer succeeds in numbing with numbers. It's the total disassembling of truth, beauty and the mysterious meaning of words. Except for the Concordance feature, which arranges the 100 most used words in the book into a kind of refrigerator-magnet poetry game. Here's a poem made from the Concordance of Dave Ramsey's "Total Money Makeover: A Proven Plan for Financial Fitness": Emergency. Find first friend. Give kids life. Live myth.

Yes, you heard right: This site is under deconstruction.

Authors! Imagine how Text Stats will help you write books that are, if not better, at least easier to read according to the Fog Index and that offer the reader more words per pound than "Moby-Dick."

Publishers! Who needs editors anymore? If the software can find SIPs, surely it can be programmed to ferret out PCSs (Poorly Constructed Sentences), ORDs (overly romantic drivelings) and DIPs (Dreadfully Implausible Plots).

And readers! You can settle bar bets. Yes, "Ulysses" by James Joyce (9 on the Fog Index) is more complicated than William Faulkner's "The Sound and the Fury" (5.7 on the Fog Index). Yes, Charlotte Bronte provides more words per ounce (13,959 in "Shirley") than her sister Emily (10,444 in "Wuthering Heights"). And, yes, Ernest Hemingway used fewer complex words (5 percent) in his short stories than F. Scott Fitzgerald (9 percent).

That's right! Now you too can sound like a literary insider at Washington cocktail parties. You can throw around statistics and make clever conversation about the hard history books, the long-winded novels, even those thick, heavy, make-you-think philosophy tomes that contain really, really long words. And the beauty of it is, with Amazon's "Search Inside" Text Stats and other features, you won't even have to read them.

© 2005 The Washington Post Company