Amazon’s system performed flawlessly in predicting the gender of lighter-skinned men, the researchers said, but misidentified the gender of darker-skinned women in roughly 30 percent of their tests. Rival facial-recognition systems from Microsoft and other companies performed better but were also error-prone, they said.
The problem, AI researchers and engineers say, is that the vast sets of images the systems have been trained on skew heavily toward white men. The research shows, however, that some systems have rapidly grown more accurate over the past year after greater scrutiny and corporate investment into improving the results.
Amazon disputed the study’s findings, saying the research tested algorithms that work differently than the facial-recognition systems tested by the Federal Bureau of Investigation and deployed by police departments in Florida and Washington state. [Amazon founder and chief executive Jeffrey P. Bezos owns The Washington Post.]
Matt Wood, an Amazon Web Services executive who oversees AI and machine learning, said in a statement that the researchers based their study on “facial analysis” algorithms, which can detect and describe the attributes of a face in an image, such as whether the person is smiling or wearing glasses. “Facial recognition” algorithms, in contrast, are used to directly match images of different faces, and would be more commonly used in cases such as identifying a wanted fugitive or missing child.
“It’s not possible to draw a conclusion on the accuracy of facial recognition for any use case — including law enforcement — based on results obtained using facial analysis,” Wood said. The results “do not represent how a customer would use the service today.”
Wood also said that the test was done on outdated software and that recent internal attempts to replicate the tests showed more accurate results than the researchers’ findings. Amazon said in November that it had updated its facial-analysis and facial-recognition features to more accurately match faces and “obtain improved age, gender and emotion attributes.”
But independent researchers said the findings raise important questions about the deployment of Amazon’s AI.
“Asking a system to do gender classification is in many ways an easier task for machine learning than identification, where the possibilities are far more than binary and could number in the millions,” said Clare Garvie, a senior associate at Georgetown law school’s Center on Privacy and Technology who studies facial-recognition software. Amazon’s “defensiveness and their unwillingness to take a closer look at this potential issue in their product speaks volumes.”
The study’s co-author Joy Buolamwini, who conducted similar research last year, told The Washington Post that the study’s methodology was ethically sound, widely replicated and has been cited by companies such as IBM and Microsoft as critical to helping improve the systems’ fairness and precision.
She called Amazon’s defense “a deflection” and said facial analysis and gender classification are fundamental tools that could be used by the algorithms to help speed up a facial-recognition search. A report by Georgetown Law researchers in 2016 found that the facial images of half of all American adults, or more than 117 million people, were accessible in a law-enforcement facial-recognition database.
“Acknowledging a known industry issue and working towards solutions instead of deflection has given (Microsoft and IBM) a platform to position themselves as responsible developers of AI technology. However Amazon decides to respond, the research has been vetted many times over and already led to industry change,” she told The Post.
The U.S. Department of Commerce’s National Institute of Standards and Technology, which evaluates facial-recognition systems on accuracy, said it has tested systems by 39 developers and has seen growing accuracy in recent years. But the tests are voluntary, and companies such as Amazon and Google have declined to participate. Buolamwini urged Amazon to submit its models for the public benchmark tests.
She also urged Amazon to alert clients of system biases and “immediately halt its use in high-stakes contexts like policing and government surveillance.”
The promise of facial-recognition technology that could precisely identify people from afar has touched off a multimillion-dollar race among tech companies, which contend that the technology could speed up police investigations, improve public security and save lives. An FBI counterterrorism official, speaking in November at an Amazon Web Services conference, said the bureau had seen remarkable results when testing the software on data from the Las Vegas mass shooting in 2017.
But questions surrounding the systems’ accuracy — and concerns that the technology could be used to surveil people without their knowledge or consent, stifling public protests and chilling free speech — have led civil rights and privacy advocates to speak out against a technology they fear could yield potentially deadly results. A software flaw, for instance, could motivate police or security personnel to react violently toward a person incorrectly flagged as a wanted criminal.
The technology has fueled a growing debate in Silicon Valley and Washington over how to regulate against possible dangers, with some lawmakers calling for a more robust national law governing use and privacy. Some Amazon employees and shareholders have urged the company not to sell facial-recognition software to police, and executives from other companies have urged the government to rein in the industry.
Satya Nadella, chief executive of Microsoft, which is developing facial-recognition software but has also called for stronger regulation, said last week that the facial-recognition business is “just terrible” and “absolutely a race to the bottom."
"It's all in the name of competition. Whoever wins a deal can do anything. That's not going to end well for us as an industry or for us as a society,” Nadella said. “It is better to have some modicum of rules by which we all play so we protect what actually matters the most.”
The technology’s deployment has quickly outpaced regulation, and facial-recognition systems can now be found in airports, concert halls and restaurants. Children, parents and visitors of schools and community centers across the United States are also being scanned by the unproven systems for potential security threats.
Tests conducted by the American Civil Liberties Union last summer found that Rekognition had mismatched the photos of 28 members of Congress with the mug shots of people who had been arrested, with higher error rates for people of color. Amazon said those tests were incorrectly administered in a way that skewed the results.
Facial-identification systems work by breaking down images into complex numerical codes known as hashes that can be compared rapidly across a vast database of other images. Similar AI technologies are used to suggest photo tags on Facebook, unlock the Apple iPhone or confirm a traveler’s identity at airports across the United States.
When facial-recognition or facial-analysis algorithms return results, they include ratings of their own confidence in their findings: A direct match may be 99 percent, while a fuzzier or more inconclusive match would rate lower. Executives at Amazon and other companies have challenged similar research by arguing that forcing the algorithms to choose a specific male-or-female answer can provide misleading results, and runs counter to how police, engineers and other users are trained to use the technology in real life.
A study last year by Buolamwini and computer scientist Timnit Gebru found similar gender-classification errors — and broad accuracy gaps between lighter and darker skin tones — in the systems developed by IBM, Microsoft and the Chinese tech company Face++.
In the following months, IBM and Microsoft announced they had improved their algorithms and were showing more accurate results across genders and skin tones. The new study confirmed that those companies’ accuracies had improved but that the systems were still more likely to give an inaccurate gender for darker-skinned faces.
The tests were conducted in August using facial images of about 1,200 members of national parliaments from six countries across Europe and Africa. Amazon’s error rate for all faces, the study said, was about 8 percent, compared with 4 percent for IBM and less than 1 percent for Microsoft.
“The potential for weaponization and abuse of facial-analysis technologies cannot be ignored,” Buolamwini and study co-author Deborah Raji wrote.