By Rob Pegoraro
Thursday, November 6, 2008
Speech-dictation software has advanced immensely over the past decade, but it's still not like talking to the Starship Enterprise's computer or a human stenographer.
Getting a computer to turn your speech into words on the screen requires buying expensive, resource-intensive software and mastering a sometimes confusing syntax of spoken commands, all to yield text that can be as riddled with errors as a beginner touch-typist's output.
And yet: To type without putting fingers to a keyboard, even imperfectly, is to turn science fiction into fact. For people fighting repetitive-stress injuries or other ailments that prevent keyboard use, this capability is worth all its costs.
I recently tried two programs that make this possible -- Nuance Communications' Dragon NaturallySpeaking 10, which shipped in August, and MacSpeech Dictate, which brought effective voice dictation to the Mac for the first time earlier this year and received a major update two weeks ago.
The test reinforced what I learned from a review of an earlier edition of NaturallySpeaking in 2006: These programs won't work unless you do first.
You have to force yourself to speak precisely and at the same pace, memorize the programs' editing commands, suppress any tendencies you might have to talk to yourself or to the computer, and -- this may be the tough part -- resist the temptation to switch to the keyboard.
Otherwise, the results will probably look like one of Google's erratic, automatic language translations: A reader can get the gist of the story, but an English teacher would give it an F for all the grammar mistakes.
NaturallySpeaking, which starts at $99 for its Standard version and requires Windows 2000, XP or Vista but won't work in those systems' 64-bit editions, is the far more polished product, as you'd expect for a program that debuted more than a decade ago.
NaturallySpeaking's setup, like that of its predecessor, asks you to spend a few minutes reciting some text into a headset microphone -- the one built into your computer won't do -- to train it in your speech patterns. It can then analyze your e-mail and other documents to learn more parts of your vocabulary.
Nuance, based in Burlington, Mass., says this version runs dramatically faster than NaturallySpeaking 9. But it lagged behind my patter on a new Windows Vista laptop, often typing out one sentence only after I was halfway through the next one.
This update provides a noticeable speedup of many text-editing tasks. You no longer must select words ("select 'these words' ") before issuing a command ("delete"); instead, you simply say "delete 'these words'."
The $199 Preferred edition of NaturallySpeaking adds Voice Shortcuts that can launch programs on preset jobs. For instance, saying "search Wikipedia for Washington Post" brought up the online encyclopedia's entry on this paper.
This release seemed less stable than the previous edition. In Vista, it frequently crashed or froze up, requiring a restart of the program.
NaturallySpeaking also had problems working with less common applications. In the OpenOffice Writer word processor, for example, commands that worked in Microsoft Word failed.
MacSpeech Dictate, by MacSpeech of Salem, N.H., shares its core speech-recognition software with NaturallySpeaking, as well as a picky taste in system requirements: This $199 program only runs on Macs with OS X 10.4 or 10.5 and an Intel processor.
Dictate also features a similar voice-training setup -- you even read almost the same essay to the computer -- but does not try to build its vocabulary by analyzing your documents or the words you've added to a Mac's system-wide spell-check dictionary.
And Dictate not only demands a headset microphone but also requires that this be plugged into a USB adapter for better sound quality.
Dictate requires you to speak your punctuation ("for example, comma, this"), while NaturallySpeaking can try to insert it automatically (though that program did a lousy job in my testing). And Dictate provides fewer ways to control your computer's programs than its Windows-only counterpart.
Last month's version 1.2 update added two valuable features that should have made its 1.0 release: the ability to spell out a word letter-by-letter and train the program to recognize new words or phrases.
Both NaturallySpeaking and Dictate did about as well in recognizing my speech, whether in a quiet room or with the TV on in the background. They both got maybe 90 percent of my words right after their initial training, a number that ought to increase over time with continued training and corrections.
Unfortunately, fixing each program's "speakos" with voice commands takes more time than cleaning up typos with keystrokes. The most common thing I said to each program was "scratch that," the command to erase the last words entered. And editing via voice was frustrating enough that the next-most common thing I said was "[expletive] you" (the f-word, incidentally, did not appear to be in either's vocabulary).
For most people, getting these programs to perform at their highest level will require serious, disciplined effort, not too different from the work involved in learning to type in the first place. Speech is one of the simplest, most natural things a person can do, but speaking to a computer remains a far trickier thing to master.