washingtonpost.com
TONGUE IN CHECK
With Translation Technology On Their Side, Humans Can Finally Lick the Language Barrier

By Joel Garreau
Washington Post Staff Writer
Sunday, May 24, 2009

The young American soldier recalled the time in Iraq he came across the badly burned little girl. He was on patrol. Trouble ahead. A house had been set on fire. In front of it was the girl, just standing there, all alone.

There he stood, helplessly, in full battle rattle, with his ballistic glasses and helmet, his weapon bristling, his body armor making him waddle like a bipedal rhino.

He spoke no Arabic. He couldn't comfort her, he couldn't tell her he wanted to get her medical help.

"I sure wish I'd had one of those," he told Jennifer Gollob.

Gollob points to a machine that easily fits in a bag the size of a woman's purse. It's a universal translator. It is being tested in Iraq by DARPA -- the Defense Advanced Research Projects Agency -- the legendary research and development works in Arlington where Gollob is a contractor.

The machine interprets the spoken word. You talk in English. It repeats whatever you said in spoken Iraqi Arabic. It then awaits a spoken response from the Iraqi, and talks back to you in English.

It's pretty good, says Mari Maeda, the program's manager. About 70 or 80 percent accurate. Not as good as a human. But the number of human interpreters willing to work around gunfire is finite.

DARPA is aiming to get an affordable iPod-size interpreter on the chest of every American warrior, foreshadowing the day such devices will be as common as music players.

Independently, Google is deploying its strikingly successful Translate project. It instantly translates text among 41 languages from Bulgarian to Hindi with surprising felicity. The big question is how soon Google will release a voice version, making the world's cellphones multilingual.

That sound you hear? It's the sound, after all these millennia, of the Tower of Babel rising once again.

* * *

On Jan. 7, 1954, IBM announced, with great fanfare: "Russian was translated into English by an electronic 'brain' today for the first time." Routine machine translation, we were told, was only five years away.

Half a century later, computers have mastered challenges that impress even geneticists, chess grandmasters and research librarians. But machines still have the devil's own time with routines common to any healthy 2-year-old. Becoming fluent with languages, for example.

To this day, if you want to get a translation absolutely right, go find yourself a talented human. "Nuclear power," says Kevin Hendzel, a spokesman for the American Translators Association, when asked of areas where you want tremendously good human translation. "Negotiations for disarmament. The pharmaceutical industry. Zero-error work with millions of dollars" riding on the outcome. Hendzel has served as an interpreter on the presidential hot line.

The trouble with meticulous, culturally sensitive human translation, of course, is that it is slow, pricey and rare.

Suppose you are willing to settle for blazingly fast, cheap, "good enough" translations. Especially those aimed at languages spoken by the rich, multitudinous or dangerous. Enter the new generation of machine translators that in the last year have begun to open broad new vistas.

For decades, translation programs tried to be rules-based. Teach the machine that in English the adjective comes before the noun; in French it's the reverse. Seems logical. But not only is it tedious and expensive to get a bunch of linguists to collect such intricacies, it produces laughable results. Just try Yahoo Babel Fish, for example. Language turns out not to be an Industrial Age machine of discrete parts.

One linguist, discussing the problem on the technology news Web site Slashdot, writes: "Parsing English is easy by comparison. I work with another language where there is a slight stress difference between the sentences 'That might be true' and 'He's honestly picking his butt.' The words 'soup' and '[poop]' are differentiated by a 40-50% increase in the length of the last vowel. There is one word for both 'blue' and 'green', and another word for 'yellow', 'orange', and 'brown'."

The explosion of the Web, however, has enabled a revolution. Like so many successful human approaches, it relies on brute force and ignorance. This method cares little for how any language works. It just looks -- Rosetta stone fashion -- at huge amounts of text translated into different languages by humans. (Dump decades of U.N. documents into the maw.) Then it lets the machine statistically express the probability that words in one language line up together in a fashion comparable to another set of words in another language.

For this statistical approach to work, of course, you need astounding computer power and zillions of pages of text.

Whom does this make you think of?

Google, perhaps?

This also means that the people who do the statistical approach do not talk about programming their software. They talk about "training" it.

Cue the spooky music.

Yes, we are creeping up on artificial intelligence here.

Owning Speech

"It is coming," Peter Norvig says of the day when cellphones translate conversation. "We don't announce things before their time. But there will be products coming out soon. The early generations will be only for the early adopters, and then later on it will reach the masses."

Norvig is the director of research at Google, arguably the world's leader in machine translation. "Certainly we're the broadest. We have over 40 languages and we translate between all pairs of them . . . in any subject domain . . . and nobody else does that."

Google still hires professional human translators to create high-value pages, like the ones in French telling people how to use Google. "It's a matter of ownership," he says of taking pride in presentation.

But Norvig refers to professional human translators as "a small guild" carving up a market of a few billion dollars. With Google Translate, he's talking about making billions of routine pages more available than ever for billions of ordinary people.

"I think most of the time now, you take a newspaper article" and run it through Translate "and you can understand what's going on. It will be very rare that you think a native speaker did the translation. You'll notice disfluencies in every sentence. But you'll know who did what to whom."

Indeed, on "Meteor," a 1-to-100 scale of these things in which 40 means you're getting the general idea, and 70 is as good as most human translators, Google gets in the 50s on the Arabic-English pairing, says Alon Lavie, president of the Association for Machine Translation in the Americas. "Far better than gist. Pretty damn good. They're the 800-pound gorilla."

Google wants to own speech. Whenever you call 1-800-Goog-411 and say "pizza," you are teaching their computers to associate the way you say that word with its text version, Mike Cohen of Google told Technology Review.

Using those smarts, in November, Google unveiled an app to search on any topic you can imagine by talking into your iPhone. Automatically and relentlessly, day and night, that feature provides even more real-world training for their voice-recognition bots.

When all this becomes a routine part of Google's Android mobile software, how big a deal will it be to culture and society to have a cellphone that will allow you to talk to most of the world's 6 billion people?

"In some ways I am more enthusiastic about the text part" of translation, Norvig says. "I think that opens up a lot. If you're a speaker of a minority language -- say, Arabic -- how much of the Web is accessible to you? Well, it's really a small portion of 1 percent or so. But if we can now translate those Web pages, now all of a sudden the whole world opens up to you. It's a lot more information and it's also different worldviews."

Basic Training

If you're looking for an organization with deep pockets and an appreciation of the "Cool Hand Luke" life lesson -- "What we've got here is a failure to communicate" -- there's little to compare with the American military.

"We knew that we couldn't build something that would work 99 percent of the time, or even 90 percent of the time," says Maeda, the program manager for DARPA's Spoken Language Communication and Translation System for Tactical Use, or TRANSTAC.

"But if we really focused on certain military use cases, then it might be useful just working 80 percent of the time. Especially if they don't have an interpreter and they're really desperate for any kind of communication.

"You interview soldiers and Marines returning from the field," Maeda says, "asking them, 'What are you interacting with the locals about? What are the typical situations?' "

Red-faced screaming matches are not what DARPA has in mind.

"We definitely don't want to handle these kinetic confrontational situations. We want to be able to have these systems used in cooperative, cordial conversations. We focused on checkpoint operations, stopping the vehicles. 'Please open the glove compartment . . . the trunk.' We also started to do meet and greet -- visiting local leaders."

Can you discuss politics using these little machines?

"That's not one of their domains. But we do elections. 'Where is the voting booth in your village?' for instance."

Basic questions of life are tremendously important to people. "SWET questions -- sewage, water, electricity, trash-related questions. 'How frequently does the power go off? Do you have a backup generator? How often is the trash collected?' Medical is also there."

If you want the machine to respond quickly and coherently, it pays to narrow the scope. "But it is difficult," Maeda says. "When you sit down with an Iraqi soldier, you can start talking about anything -- about my daughter's wedding and about home life and things like that. And then, of course, it will degrade. But the vocabulary that's in the system is tens of thousands of words, both English and Arabic." In normal conversation, many humans use only a thousand words.

Maeda has got big plans. In addition to getting these machines down to the size of an iPod, and cheap enough to give one to every soldier in combat, she wants them to be networked, so that if one soldier discovers an error in translation, all the machines will learn. She also wants the devices to be capable of rapid deployment with "surprise languages."

"These things are very useful at the beginning of the conflict when you don't have interpreters. How do you stand up a new translation system that works in a language in a month with a minimum amount of collected data? That's something that we're focusing on right now."

Unsurprisingly, "we're trying to build a Dari system," Maeda says, referring to one of Afghanistan's major languages. "We're also going to try to build a Pashto system, which is very challenging, because there are quite a few dialectical variations."

Take the machine for a spin, she offers. So into it, you say:

"This is only a machine doing the translation, it won't be perfect."

To check that it understood you accurately, the machine attempts to repeat your words back to you. Sure enough, it gets:

"This is only a machine doing the translation it won't be perfect."

It translates into Arabic and speaks your phrase in Arabic.

Then, to give you another reality check, it translates back into English what it just said in Arabic, and on its screen displays:

"This is a translation device they don't would be great."

Good enough?

The Human Touch

What does constitute "good enough"?

Ah, there's the rub. Compared to what?

The world's common language is not English, it's broken English, says Alex Waibel of Carnegie Mellon, a DARPA principal investigator, born in Germany, who spends his life in international conferences where English is everybody's second or fourth language. Eighty percent machine accuracy is better than some very large portion of these alleged English speakers, he says.

"Human translators aren't actually that great," Waibel says. In one study, people listened to a machine interpreter and then were asked questions to measure their grasp of content. The score was 64 on a 100-point scale. Not wonderful. But when they did the same test with a human simultaneous interpreter, the result was not a lot better -- a 74.

"When humans try to figure out how to translate one thing, they drop their attention as to what's coming in the next graph," Waibel says. "And they're human. They get tired. They get bored."

"This is a force multiplier," says Gollob. "You've got only one interpreter to talk to one person. But you've got other soldiers and they may also want to talk to another individual. And they might not trust exactly what the interpreters are interpreting. Very often they converse for 10 minutes and you get three utterances out. 'Well, you've been talking for the last 10 minutes. What were you saying?' The soldiers really want to know."

Then there is human bias.

If you have a "Sunni interpreter, and the soldier wants to interact with a Shiite person, the Sunni interpreter is going to phrase things differently because he feels, you know, different about the person he's interacting with."

The Sunni might be talking down to the Shiite?

"Yeah, exactly. And the machine doesn't do that kind of thing."

Says another poster to Slashdot:

It "reminds me of the old joke:

"Guard: 'Now tell me where you hid the money, or you will suffer.'

"Translator: 'Tell him where the money is, or you will suffer.'

"Prisoner: 'I'll never speak.'

"Translator: 'He says he won't tell you.'

"Guard: Putting gun to prisoner's head. "Tell him I will blow his brains out if he doesn't tell me immediately.'

"Translator: 'He will shoot you in the head unless you tell him now.'

"Prisoner: 'I buried a million dollars under the floorboards in the old woodshed.'

"Translator: Pauses. 'He says you don't have the guts to shoot him . . . .' "

Making the Connection

A good machine can really lubricate human connection, Waibel reports. When global researchers hit the town after a conference in Japan, they plopped one of his translators down in the middle of the table. A grand old sake-fueled time was had, as they communicated in ways beyond the unaided capabilities of any of them.

But can all our cleverness re-create a Genesis world, in which "the people is one, and they have all one language . . . and now nothing will be restrained from them, which they have imagined to do"?

There's that nagging problem of how much more clever are our flesh-and-blood than our creations. Waibel recalls his family visiting Australia. His 2-year-old son, Joshua, looked out from the hotel lobby at a creature loping across the lawn.

"Kangaroo!" he said.

Waibel's eyes go wide at the very scope of this accomplishment. He's devoted his life to figuring out how to allow machines to make connections among words.

How do you replicate the way a toddler accurately and instantly makes the connection between some cartoon he'd glanced at months ago and an utterly novel real-world situation?

How did his little brain do that?

View all comments that have been posted about this article.

© 2009 The Washington Post Company