Image Labeling for Blind Helps Machines 'Think'

By Zachary A. Goldfarb
Special to The Washington Post
Tuesday, November 21, 2006

As director of Web operations at the American Foundation for the Blind, Crista Earl knows more than most about how visually impaired people can access the Internet. Still, when she browses the Web, Earl, 48 and blind, finds it time-consuming and difficult to use.

"It's a huge waste of time," she says. "There's no way to do the functions of many Web sites."

Earl's problem is that the program she uses to make the Internet accessible -- a screen reader that speaks Web pages aloud -- cannot describe pictures and images, an essential part of Web sites. Computers are not yet able to look at an image and know what it is.

For the blind, the only solution is for each image to be labeled with an accurate description for the screen reader to say aloud. But few Web site designers do that.

That is why researchers are studying ways to tap the powers of the Web to have ordinary users label great numbers of images. Asking people to label image after image, however, is asking them to become bored quickly. To make it less tedious and more fun, Luis von Ahn, a computer science professor at Carnegie Mellon University, has created the ESP Game.

Two random visitors to are matched up and shown a random image, which they are asked to label. They cannot communicate. When both provide the same label, they win points. At the same time, computers are associating words with images, a valuable service for the blind.

Von Ahn has found that the game is addictive -- hundreds of thousands have played, with some spending more than 40 hours per week on the site -- and goes a long way toward giving precise descriptions to images. His work is part of an emerging field of computer science called human computation, because the computer is posing the problem, and it is up to people to solve it.

Google Inc. recently built a version of the ESP Game on its site, and this year von Ahn won the MacArthur Prize -- known as the "genius prize."

Peter Norvig, director of research at Google, says the image project is an extension of its main product -- the search engine -- which organizes search results by analyzing the content and links that people put up on the Web. "Most of what we try to do at Google is build automated solutions," he says. "We use the human input and then write programs to harness that input."

The promise of von Ahn's research is that it will allow computers to replicate the complex abilities of the brain. "What he's doing is mining the ability of humans," says Manuel Blum, a Carnegie Mellon professor who advised von Ahn's dissertation.

Von Ahn says he has one goal: "To be able to use all of this data and to have computers be able to do pretty much everything we can do." The end result is a kind of artificial intelligence that would drive a computer to think and act like a human -- the kind only seen in science fiction movies.

There are already rudimentary examples of human computation in use. Many online stores, for example, feature a recommendation system that suggests products to a consumer after considering the buying patterns of like-minded customers -- essentially creating a knowledge database of consumer tastes as a salesman in a brick-and-mortar store would.

Von Ahn envisions computers in the future translating foreign text while respecting the nuances of language or summarizing lengthy documents effectively. And he sees computers making fast diagnoses of ailments in hospitals.

But these are a ways off. The problem is that computers often do not have enough examples to come up with reasonable judgments.

From the moment we are born, von Ahn says, we begin to store countless images, sounds, smells and other perceptions from daily experiences -- and immediately associate words with them. Over time, we develop a seamless ability to describe things. We call it common sense.

Show a 6-year-old a picture of a boy walking a dog, and the child will instantly be able to describe it. Show the same picture to a computer, and it would not be able to describe what is happening. "Nobody bothers to teach a computer," von Ahn says.

With the ESP Game and other projects, he is trying to devise ways for humans to provide enough experiences to computers so that they can come up with common-sense judgments or descriptions.

One of the best examples involves security at seaports and airports. Currently, computers do an awful job of screening luggage or containers for explosives or weapons -- one reason port security has languished, given the expense of workers manually searching containers. Computers would have trouble differentiating between a bomb and a ball.

According to von Ahn's thinking, though, people would go through a series of exercises teaching computers with X-ray vision to scan luggage or containers for contraband, correcting the computer when it flags something innocuous. With many examples stored, a computer ultimately could analyze what is in a piece of luggage or container and determine whether something is dangerous.

View all comments that have been posted about this article.

© 2006 The Washington Post Company