Computers can be pretty smart. But they are finicky about their reading material.
The text must be neatly printed, for the machines stumble over hand-printed characters. They stare and stare, but they cannot tell the difference between G, 6 or C. The worse the handwriting, the bigger the mistakes.
Now the National Institute of Standards and Technology (NIST) has amassed a first-of-a-kind database that contains more than a million examples of hand-printed characters. The database should help designers test the performance of their machines as they struggle, like a class of first-graders, to read handprinted letters and numbers.
Making machines that read handwritten text is a tantalizing but elusive goal. Banks, insurance companies and the Internal Revenue Service, for example, could relegate the rather mindless task of data entry to machines.
So far, character-recognition machines are best at identifying text written by other machines or by very careful writers. But most people do not write like machines.
The new database came from handwriting samples of 2,000 Census Bureau employees around the country.
In addition to giving computers a wide variety of handwriting styles to mull over, the database researchers will test the idea that the country harbors regional handwriting styles, as it does dialects.
"There is a theory that handwriting, like speech, has local variations in different parts of the country. This database will help determine whether the theory is valid," remarked Charles Wilson of NIST.