In just a handful of years, the business of emotion detection — using artificial intelligence to identify how people are feeling — has moved beyond the stuff of science fiction to a $20 billion industry. Companies such as IBM and Microsoft tout software that can analyze facial expressions and match them to certain emotions, a would-be superpower that companies could use to tell how customers respond to a new product or how a job candidate is feeling during an interview. But a far-reaching review of emotion research finds that the science underlying these technologies is deeply flawed.

The problem? You can’t reliably judge how someone feels from what their face is doing.

A group of scientists brought together by the Association for Psychological Science spent two years exploring this idea. After reviewing more than 1,000 studies, the five researchers concluded that the relationship between facial expression and emotion is nebulous, convoluted and far from universal.

“About 20 to 30 percent of the time, people make the expected facial expression,” such as smiling when happy, said Lisa Feldman Barrett, a professor of psychology at Northeastern University, who worked on the report published earlier this month. But the rest of the time, they don’t. "They’re not moving their faces in random ways. They’re expressing emotion in ways that are specific to the situation.”

It’s not surprising that something as complex and internalized as human emotion defies easy classification. Humans tend to instinctively draw on other factors, such as body language or tone of voice, to complete their emotional assessments. But the majority of emotion-detection AI makes inferences purely on mapping facial positioning, a concept that stems from the work of Paul Ekman, a psychology professor at the University of California at San Francisco. He posited that six emotions — happiness, sadness, disgust, fear, anger and surprise ― are represented by universal facial expressions across all cultures.

When Microsoft rolled out its emotion detection technology in 2015, it said its algorithms could “recognize eight core emotional states — anger, contempt, fear, disgust, happiness, neutral, sadness or surprise — based on universal facial expressions that reflect those feelings.”

It’s a common justification for such technology, and exactly what Barrett and her colleagues are pushing back against. The companies are not trying to be misleading, she said, but they need to dramatically change their approach to emotion detection to get the kind of results many already purport to have. (Microsoft declined to comment on how or if the review would influence its approach to emotion detection.)

“We now have the tools and the analytic capability to learn what we need to learn about facial expressions in context and what they mean,” Barrett said. “But it requires asking different questions with that technology and using different analytic strategies than what are currently being used.”

Such technological limitations come with risks, especially as facial recognition becomes more widespread. In 2007, the Transportation Security Administration introduced a program (which Ekman consulted on but later distanced himself from) that trained officers to try to identify potential terrorists via facial expression and behavior. A review of the program in 2013 by the U.S. Government Accountability Office found that the TSA hadn’t established a scientific basis for it, and that the program didn’t translate to arrests. In 2017, a study by the American Civil Liberties Union concluded that the program fueled racial profiling.

To get on the right track, Barrett said, companies should be working with far more data, training their programs to consider body positioning, vocal characterization and situational context just as a human would. At least one company says it’s embracing the more multifaceted approach: Affectiva, the first company to market “emotion AI.” The company, which claims to hold the largest collection of emotion data in the world, works with naturalistic video rather than static images and is trying to integrate such factors as a person’s tone or gait into its analyses.

Rana el Kaliouby, Affectiva’s co-founder and chief executive, said she welcomes the review’s findings, adding that they mirror issues she has been trying to tackle since she was a doctoral candidate at Cambridge University in the early 2000s.

“I’ve tried to solve the same issues all these years, but we’re not there yet as an industry,” Kaliouby said. “I liken it to the emotional repertoire of a toddler: A toddler will understand simplistic states, but they won’t have the language or sophisticated sensing to recognize complex emotions.”

Affectiva has encountered these limitations organically in the course of business and tried to adjust to them, she said. Several years ago, Kaliouby said, a client complained that Affectiva’s technology wasn’t producing results in China when it tried to analyze responses to ads. It turned out that the program had trouble recognizing a subtle “politeness smile” many subjects displayed because of a lack of training specific to region.

This helped highlight how different cultures display facial expressions, Kaliouby said, leading Affectiva to incorporate “culturally specific benchmarks” for facial movement. The company now trains its systems with more than 8 million faces from 87 countries.

The industry will evolve as it acquires more data, Kaliouby said. In the meantime, the technology is already being put to use.

“The naive mapping that people in academia and, unfortunately, in the industry are doing is quite dangerous, because you’re getting wrong results,” Kaliouby said. Oftentimes, she added, clients aren’t interested in the more comprehensive approach, asking instead for analyses based on the six basic emotions defined by Ekman’s research decades ago.

Kaliouby said she hopes the review will educate both consumers and those in the industry about the limits of emotion recognition, and reinforce how far the industry has yet to go. The theoretical aim for much artificial intelligence is to create machines that are able to operate just as well as, or better than, a human.

But even creating a machine with the perceptiveness of a person wouldn’t be enough to solve the problem of emotion detection, Kaliouby said.

“Humans get it wrong all the time.”