By Leslie Walker
Washington Post Staff Writer
Thursday, May 6, 1999; Page E1
"Hello, is this the Internet? Take me to Amazon.com, please. . . . Thanks. Now find me the latest mystery from Patricia Cornwell. And while we're at it, let's see if 4 Non Blondes has a new CD. . . . Oh, great! I'll take it. Now e-mail my brother, will you? Tell him the Non Blondes CD is out. Bye for now."
Don't you wish surfing the Web were that easy, so we could talk to our computers and they would know exactly what we mean? As unlikely as it may seem, several companies are bringing voice-activated Web browsers to market this year.
I used my voice recently to navigate the World Wide Web with a browser to be released in June by a Seattle company called Conversa. The surfing was choppy and the program often took me to the wrong site, but I was surprised it worked at all.
At least three other companies – Philips Electronics, IBM and Dragon Systems – also have software prototypes that allow people to navigate the Web using simple oral commands. In addition, IBM recently began selling a companion to its ViaVoice speech-recognition software that allows people to compose e-mail and dictate into Internet chat rooms using headsets with microphones.
The next step will be when the Web talks back. "Good afternoon and welcome to Amazon.com. Why don't you give those mystery novels a break, Leslie, and chill with some poetry?" But that's still in the planning stages.
A consortium of technology companies (AT&T, Lucent Technologies, Motorola and IBM) announced in March that they are collaborating on a Web-coding language called VXML that will allow site developers to place voice markers in their pages to signal that sites can converse with visitors. The sites would offer spoken menus that visitors could listen to and choose from, much as Web pages now offer textual and graphical hyperlinks that lead to other pages.
You may be wondering why there's such a hurry to make the Internet respond to human speech, when it seems to be doing just fine with mice. It's partly because there is an even bigger hurry to hook up new devices to the Internet, including many that lack a keyboard or mouse. In a world where we access the Internet through laptops, cell phones, PalmPilots, televisions, car dashboards and even microwave ovens, what is the one "interface" we bring to each device?
"People want to converse naturally with information," said W.S. "Ozzie" Osborne, general manager of IBM's speech-technology efforts. "Computers are going to get more pervasive, and a lot of these devices don't have keyboards or displays. That's where speech will be moving over."
The companies making all the new computing devices and the software that runs them believe they must train computer chips to recognize human speech or people will grow overwhelmed in the era of pervasive computing. Without voice, television, telephones and computers may wind up colliding – not converging – with the Internet.
Speech-recognition research has been underway for better than two decades at more than a dozen companies large and small. One influential player has been Dragon Systems Inc., founded 17 years ago by a husband-wife team who began at IBM's speech lab but broke away.
Dragon's NaturallySpeaking, widely recognized as the first consumer software that could recognize "continuous speech," has sold more than 1 million copies worldwide, the company said, and its slimmed-down sibling has sold half a million copies just through advertising on America Online. IBM, meanwhile, has been selling its ViaVoice dictation software to consumers and developing an array of business applications, too.
Dragon co-founder Janet Baker is convinced that speech will become the dominant way we interact with computers. "Speech is the most common and natural and efficient means of communicating," she said. "There is nothing natural about banging on plastic keys."
When it works, speaking into a computer is faster than typing. The average person pounds out 20 to 50 words a minute on a keyboard, vs. 80 to 100 words a minute with speech-recognition software. The software takes time to learn a person's voice, but it still tends to garble 5 percent to 15 percent of words and phrases.
One Internet-based service takes dictation by telephone on a toll-free line, transcribes it by computer and then e-mails the document to subscribers for $3.50 a page. Perhaps because real people eyeball the transcript before it is e-mailed, the service misspelled only a few proper names in the column I dictated last week.
Voice systems have been seeping into many corners of the business world via the telephone in recent years – mainly with simple verbal commands, not full-sentence continuous speech systems. Charles Schwab & Co., for instance, uses it to help customers manage their financial transactions by telephone. United Parcel Service has voice-activated shipment tracking.
"We typically see increases in efficiency of 30 to 40 percent with these systems," said Dieter Kubesch, a project manager for Philips Electronics, which has licensed speech recognition for radiology readings at a hospital in Vienna.
It remains to be seen how much the Web might change when it is voice-activated. The Internet changed dramatically after its last interface breakthrough in the early 1990s – the HTML coding language that, with its hyperlinks and easy way of transmitting graphics, created the highly visual World Wide Web.
Speech is equally promising for the Net. It will add a layer of complexity for Web developers, but if it ever really works, it would make it easier for many more people to plug in and have their say on the global party line.
Leslie Walker's e-mail address is firstname.lastname@example.org.
© Copyright 1999 The Washington Post Company