This is humanoid robot Justin. He is German. He is not currently hiring anyone. (ANDREAS GEBERT/Reuters)

Artificial intelligence has been hailed as a great equalizer in employee hiring -- technology that has the potential to hide demographics, match candidates based on skills rather than resumes, and get around the biases of hiring managers who gravitate toward people who look or act like them. Companies that offer such tools have been touting those benefits, and more employers are turning to algorithms to help diversify their workforce.

But a report this week by Reuters about an experimental project at Amazon to use algorithms and artificial intelligence to recruit workers was a reminder that while such high-tech isn’t always a cure-all.

The Reuters report said that the tool -- an experiment that was scrapped by the start of last year -- was trained to evaluate applicants by observing patterns in resumes submitted over 10 years, most of which came from men. The system effectively “taught itself that male candidates were preferable,” according to Reuters, including penalizing resumes that included the word “women’s” or graduates from two all-women’s colleges. It also was returning random candidates who were unqualified for the roles. In an emailed statement, a company spokeswoman said “this was never used by Amazon recruiters to evaluate candidates."

Even though output from the tool wasn’t used to evaluate candidates, analysts and researchers who study the artificial intelligence in hiring say the incident is a warning about how such technology can be used.

“This is a perfect example of what to watch for,” said Josh Bersin, an industry analyst who studies workplace technology and advises companies. “This is the biggest risk of A.I. in recruiting, that it will it perpetuate all the biases we’ve had.”

Analysts said the use of artificial intelligence and data science in recruiting has grown from technology that initially screens resumes for keywords -- automating the front lines of hiring -- to analyzing the attributes of a company’s best performers and then “learning” how to match applicants' resumes or assessments to them. Some are going further, using artificial intelligence to try and remove biased decision-makers and only bring in humans at the last step in the process.

But employers have learned that it can be more challenging than it sounds. Brian Kropp, group vice president for Gartner’s human resources practice, said “I could tell you 10 to 20 other stories where companies have tried to create algorithms,” telling themselves “they’ve eliminated bias in the hiring process and all they’ve done is institutionalized biases that existed before or created new ones. The idea that you can eliminate bias in your hiring process via algorithm is highly suspect.”

He shared the story of how one company noticed that people from a certain Zip code quit more often, probably because of longer commute times, and decided it was going to stop interviewing people from that Zip code.

“What they didn’t take into account was there’s a demographic distribution across Zip codes. Their mix of employees and candidates became much less diverse," he said, prompting them to inadvertently hire lower numbers of people of color before they corrected the mistake.

Kropp said a survey done in early 2018 of companies Gartner works with found that 43 percent reported using an algorithm to make a hiring decision. In nearly all of those cases, the algorithm would generate a “score” that hiring managers could take into account when making final decisions, with human input, on whom to hire.

Solon Barocas, an assistant professor in Cornell University’s Information Science department who has done research on how algorithms can be unintentionally discriminatory in hiring practices, said one problem is that the underlying data can be biased. An algorithm trained to match candidates to top performers may be based on performance reviews that themselves are biased, thanks to managers who rate people higher or metrics that aren’t gender neutral.

(For instance, female leaders are often penalized when seen as too assertive -- but having an “aggressive drive for sales” may be a “competency” on which employees are graded.) “Even with the annual review score, there’s human bias involved in that assessment,” Barocas said.

Others drew a distinction between a tool that’s built in-house and crunches data on resumes submitted to one company and outside tech that filters data from millions of workers.

“You have to look at thousands of different companies’ data points,” said Kieran Snyder, CEO of Textio, which helps companies write and format job descriptions or candidate email communications to cut down on bias. “If you’re only looking at your own, not only will the A.I. not help you, it will doom you to repeating the problems you already have.”

Some companies offering artificial intelligence tools for hiring say they’re focused intently on eliminating bias. Pymetrics, for instance, which has applicants complete neuroscience-based “games” that measure traits like attention, delayed gratification, and how people filter out distraction, says such tools can be more predictive of strong hires than resume data, which it doesn’t include in its analysis of candidates at all.

It also audits its algorithms, comparing the results of different gender, racial and ethnic groups and then weighting the results “until everyone has an equal chance of passing them,” said Priyanka Jain, Pymetrics' head of product.

Even if there are ways to reduce bias in recruiting algorithms, the day when robots are actually making hiring calls still seems a long way off. Kropp said he knows a few companies that are piloting experiments where they let algorithms make a final decision for some high volume, entry-level jobs, such as retail sales or customer service, hiring people and then giving them three to six months to see how they do. In those cases, he said, “the equation seems to be just as good as the hiring manager at making a decision, but neither are particularly good.”

Yet Michael Gretczko, who leads a human capital practice at Deloitte, said that’s unlikely to become widely used.

“When it comes down to those final decisions about making a judgment call, that requires intuition,” he said, an activity that is best done by humans, “now -- and for some time in the future.”

Read also:

Investment group urges more women in Amazon’s senior ranks following harassment allegations

Like On Leadership? Follow us on Facebook and Twitter, and subscribe to our podcast on iTunes.