Early optical-character-recognition software made so many mistakes that it was often easier to retype documents than to scan and correct them. OCR has improved since then, accurately converting words on a paper document into digital form that a computer can understand and manipulate.
But in the meantime, the documents that people scan have grown steadily more complex. Besides text, OCR software now must cope with tables, graphics and multiple fonts. Users are a demanding bunch. They want the OCR version to look just like the print original.
Omnipage Pro 9.0, Caere Corp.'s lastest version of a venerable OCR package, does a good job on both fronts: The OCR accuracy is very high and it maintains formats and layouts as well.
Installation of the software was a breeze on all three operating systems with which it works: Windows 95 and 98 and Windows NT 4.0. I used scanners from Hewlett-Packard Co. and Visioneer Inc. of Palo Alto, Calif., although the software works with any scanner compliant with the TWAIN standard.
In my tests I used document tables, graphics, odd fonts, multiple columns and general desktop publishing weirdness. I also tried second- and third-generation copies of the test documents. The results were impressive. Not only did OmniPage Pro recognize the text, it also brought the layouts over.
The character recognition accuracy was amazing. Caere has claimed 99 percent accuracy on laser-printed documents with standard fonts, and my results supported the claim. Except on the most tortuously fuzzy, third-generation documents, accuracy exceeded 88 percent, even for odd fonts.
OmniPage Pro 9.0 shows its maturity without being dated. The most notable new feature is color support. On-screen versions look like the real-world printed documents, matching their colors. Photographs are displayed at a resolution of 150 dots per inch. The few users who require a higher resolution should scan pictures separately with imaging software.
OmniPage Pro 9.0 handles tables with relative ease. Tables with horizontal and vertical lines convert automatically into the chosen format. With tables without lines, however, you must manually select zones for correct recognition of the contents.
The package recognizes printed spreadsheets. A click of the "Auto Zone" button parses a spreadsheet while retaining the data and layout fairly well. I captured tables that had no lines by telling OmniPage they were spreadsheets.
The function known in previous versions as Check Recognition has been renamed and given a face lift. Now called OCR Proofreader, it checks a scanned and recognized document for possible errors. The window in which it runs can be resized to give a better view of the document.
The bundled Caere PageKeeper Standard package lets you create a digital file cabinet for scanned documents. It has mutual toolbar links with OmniPage Pro for accessing one program's features from within the other.
If you still think OCR stands for "occasionally correct recognition," you will be pleasantly surprised. Not only is text recognition accurate and reliable, the formats also survive with greater fidelity than ever before. Corporate work groups that scan large volumes of documents, as well as individuals looking to kill the paper tiger in their in-boxes, will find OmniPage Pro 9.0 a good choice, not merely a compromise.
To respond, send e-mail to email@example.com or visit the Government Computer News Web site at www.gcn.com.
OmniPage Pro 9.0
Caere Corp., Los Gatos, Calif.
Web address: www.caere.com
Price: $499 for single-user, $79 upgrade from version 8.0 for windows.
Report card: A
+ The best OCR package around
+ Handles tables and color graphics well
-- A bit pricey for casual use
-- Practice needed to get full value
Real-life requirements: TWAIN-compliant scanner, Windows 95, 98 or NT 4.0, 166-megaherz or faster Pentium processor, 64 megabytes memory, 45 megabytes free space on hard disk for software plus more for scanned documents, graphics card supporting 1,024 by 768-pixel resolution at 32-bit color depth, 17-inch or larger high-contrast monitor recommended for proofreading.