Google, Tesla, Uber and President Obama all cite the same statistic: 94 percent. That’s the share of U.S. crashes that, as Obama put it, are “the result of human error or choice.” (Tony Avelar/AP)

President Obama says automated vehicles could cut the yearly death toll on U.S. roads by tens of thousands. His highway safety chief talks about “a world where we could potentially prevent or mitigate 19 of every 20 crashes on the road.” Uber says self-driving cars “can help save millions of lives” worldwide.

Their message is clear: Robots will be better drivers than we are.

But despite the excitement and the hype, top engineers and federal regulators face a basic problem. They’re still struggling with exactly how to compare man (or woman) with machine.

“One of the hardest questions to answer is, ‘How do these cars compare to human drivers?’ ” Chris Urmson, then the chief of Google’s self-driving car project, told transportation engineers in Washington this year. “And part of the reason why that’s hard is we don’t actually have a good understanding of how good human drivers really are.”

One problem is that the U.S. government keeps no comprehensive database of crashes. That complicates what otherwise might seem to be a simple task: figuring out which vehicles are more likely to crash, human-driven ones or those run by software and sensors.

The Switch's Brian Fung got a front row seat (or really, a back seat) demonstration of Uber's self-driving vehicle launch in Pittsburgh. (Daron Taylor/The Washington Post)

Researchers at the Virginia Tech Transportation Institute, in a study funded by Google, dug into the data and discovered just how incomplete the federal numbers are.

An annual national tally of crashes relies heavily on those reported to police. It understates the actual total of crashes with injuries by at least a quarter and “property damage only” crashes by anywhere from 60 to 84 percent, they concluded. And they consider those numbers conservative, given the mishmash of state reporting requirements and holes in the local data used by federal agencies.

“It is crazy,” said Myra Blanco, a senior Virginia Tech researcher who was lead author on the study. “We knew there were going to be some discrepancies” with other sources, but not that many.

Many drivers prefer to keep crashes off the books, either because they can’t be bothered to report minor run-ins or to avoid insurance premium hikes.

The researchers cited an earlier federal analysis and telephone survey for the lower-end estimates of the underreporting.

A bigger shortfall was revealed by in-car cameras that captured thousands of motorists in the wild as part of a major federal safety study. Even though they had volunteered to cooperate — and be recorded — some of those drivers failed to notify researchers when they crashed as they were told to.

An instrument panel illustrates the road ahead using autopilot technology in a Tesla Model S P90D vehicle. (Christopher Goodney/Bloomberg News)

“As human beings, we don’t want people to think poorly of us,” Blanco said.

But the cameras didn’t lie: 84 percent of crashes weren’t reported to police.

Based on general accident data, Google’s self-driving cars appeared to crash more often than cars operated by humans. But once the underreporting was factored in, they performed better than people. The researchers also noted that the self-driving cars were not at fault in any of 11 crashes studied. The small numbers made it statistically tough to identify “true differences,” the researchers said.

“We understand this is just directional, and we understand it’s not definitive. But it’s exciting and interesting,” said Urmson, a longtime driverless-car leader who left Google this summer.

Police-reported crashes reached 6.3 million last year, according to U.S. figures, about half the number reported to insurers. One federal study estimated 13.6 million total crashes in a year; Virginia Tech used an upper estimate of 29 million.

Google wants the National Highway Traffic Safety Administration to cut through the noise and create a comprehensive database or provide reliable sampling“in order to accurately benchmark the performance of self­-driving vehicles.”

In search of new metrics

NHTSA chief Mark Rosekind said this summer that “equivalency” between human and machine is far from good enough.

“We should not move forward when automated vehicles are just as safe — or really, as dangerous — as human drivers. They need to be much safer,” Rosekind told industry leaders in San Francisco. He made it clear that how much safer was enough remains an open question: “Two times safer? Five times? Ten times? And what does ‘safer’ actually mean?”

Road deaths, tracked in a detailed census, climbed 7 percent to 35,092 last year, and alarmed federal officials this month said they soared an additional 10 percent in the first half of 2016.

Rosekind has been traveling the country asking manufacturers, software developers and safety experts some version of the same question: “What are the new safety metrics we need to be using now? Do we count by crashes? Do we count by fatalities?”

He also pressed for ways to “count the lives saved” by the technology.

In policy guidance for tech and auto companies released last month, federal officials laid out a 15-point safety assessment. Companies are asked to describe where and under what conditions their cars are designed to drive: for instance, in daylight on dry roads or in a particular city. Then they’re asked to document how they know the cars will be safe.

But officials didn’t set out actual safety measurements, saying the federal government first needs to do more research, including gaining real world insights from the companies.

Coming up with “performance metrics” was put on the list of “follow-on actions.”

Skeptics see this as a bad sign.

“It actually speaks volumes when an agency says: ‘Help us. We don’t even know how to measure you,’ ” said Missy Cummings, who heads Duke University’s Humans and Autonomy Lab and has warned that self-driving cars have been unleashed before being proved safe. “That’s the job.”

Others called that an unfair swipe at a federal effort to tap the expertise of private industry in a fast-changing area. Federal officials say they want to avoid imposing half-cocked standards that end up stymieing innovation. They also say the regulator’s traditional job description needs to be reexamined.

“There’s a tremendous power in harnessing this technology for good,” said Transportation Secretary Anthony Foxx.

Regulations will come later, as industry and the department learn more, Foxx said. Although regulations in the past have generally been put in place after technologies have gone to market, Foxx said, the department is trying to shape safety on the front end.

“But you can only begin at the beginning, and that’s where we are today,” Foxx said.

Countless variables

Still, the question remains: How will the government know when algorithms are safer than humans? How will companies know? And which humans are we actually talking about being better than?

“Twice as good as what? Twice as good as a 16-year-old? Twice as good as a 50-year-old?” asked Brandon Schoettle of the University of Michigan Transportation Research Institute. Who you choose for the sake of comparison “really makes a big difference,” Schoettle said.

Complicating matters further, experience is just one of countless variables. Drunk drivers are involved in about a third of all fatal crashes, for example, dragging down the stats for the rest of us.

“If you’re an alert, attentive, sober driver, your risk is really low. You’re really good at avoiding crashes,” said Tom Dingus, director of the Virginia Tech Transportation Institute. “It will be difficult making automated vehicles that are as good at avoiding crashes as you are.”

Over 25 years, the average driver will successfully step on the brakes 3 million times and smash into the car ahead just once, Dingus said. Beating that performance in all conditions will be tougher than people think.

In the meantime, there will be a range of designs and results, Dingus said. That’s particularly true with partially automated cars, in which the human driver is supposed to be ready to take control at any moment.

“There’s likely to be good ones and bad ones. There may be some that actually increase crashes and not decrease them,” Dingus said.

Pioneering electric carmaker Tesla has collected tens of millions of miles of driving data from customers’ cars to see how its semi-automated features stack up. Tesla says human drivers working with its “autopilot” technology are safer than humans driving alone, despite a crash death in May.

The company noted that one person dies for every 89 million miles traveled on U.S. roads. “Autopilot miles will soon exceed twice that number, and the system gets better every day,” Tesla said. The company says “both the frequency and severity of collisions” should be part of any metrics evaluated by the government.

Whether or not they end up being required by law, “I think everybody agrees there need to be metrics” and effective tests to make sure driverless cars are “really ready,” Dingus said.

To err is human

For now, the lifesaving promise of automation often gets boiled down to a single statistic: 94 percent.

Obama, safety regulators, Google, Tesla and Uber all cite the number.

It’s the share of U.S. car crashes that, as Obama put it, are “the result of human error or choice,” a catchall for the combination of imperfections, idiocies and tragic mistakes that hurt millions every year. It’s a basis for the broad safety claims made by industry and officials.

But knowing the depth of human havoc on the road is just the beginning.

“It’s too simplistic to think that because 94 percent of crashes are caused by human error, that taking the human out of the equation is going to eliminate 94 percent of the crashes,” said David Zuby, chief research officer at the Insurance Institute for Highway Safety.

Tracking actual safety gains will be messy. For many years, humans may do better in some areas while robots outperform them in others.

“You may just need to set up different comparison groups and convince yourself, ‘Maybe in this one comparison we’re not better than humans, but look at all these areas where our crash rates are better than humans,’ ” Zuby said.

And as cars with imperfect automation are turned loose on the roads, which he finds likely, “it will be interesting to see how many crashes they really eliminate. There isn’t really a good way to know that without seeing it.”