The Washington PostDemocracy Dies in Darkness

The incredible potential and dangers of data mining health records

Using the world’s best computers to analyze our health data could revolutionize how we live. (Connie Zhou/Google)
Placeholder while article actions load

Imagine if your doctor could compare your physical health, diet and lifestyle to a thousand Americans with similar characteristics, and realize that you need treatment to prevent heart failure next month.

What if an analysis of your genome could help a physician give you a customized cancer treatment that saves your life?

Unleashing the modern power of computers, data crunching and artificial intelligence could revolutionize health care, improving and extending lives.

It’s the kind of potential Google chief executive Larry Page hinted at when he told the New York Times earlier this year that “we’d probably save 100,000 lives next year,” if we data mined health care data.

“Imagine you had the ability to search people’s medical records in the U.S.,” Page said in another interview this summer. “I imagine that would save 10,000 lives in the first year.”

Page’s numbers sound impressive, but are speculative and unfounded, according to many in the medical industry.

Interviews with more than a dozen health care professionals and data scientists found no evidence backing Page’s specific claims. While they universally agree that data mining — the examination and analysis of huge batches of information — could invigorate health care, they caution that any sort of accurate estimate would be impossible.

“Usually when I see someone put a number on it and throw around saving lives it usually means one, they aren’t usually a clinician or someone who provides care, or No. 2 it’s someone who really knows better, but is trying to grab a headline,” said Nicholas Marko, the department head of data science at the Geisinger Medical Center.

A Google spokeswoman declined to offer an explanation of Page’s numbers, or make him available for comment.

In one other instance where Page has used an unsubstantiated health care statistic, he told Time Magazine  last year that solving cancer would only “add about three years to people’s average life expectancy.” That’s a figure the American Cancer Society and National Cancer Institute had never heard of before. A Google spokeswoman didn’t have an answer when asked for an explanation.

To a cynic, Page is a shrewd businessman twisting facts to shape the national dialogue so that he can profit from absorbing our health data into the Google cloud, where his world-class engineers will find ways to make money off all of that information.

An optimist might remember Page’s assertion that Google is a company devoted to solving “huge problems for hundreds of millions of people,” and offer him the benefit of the doubt.

“Health care has been pretty archaic. We’re pretty behind the curve on things,” said Lorren Pettit, a vice president for the Healthcare Information and Management Systems Society, which aims to improve health care through information technology.  “We need the innovation of people from outside health care to come in and take a look and challenge this industry, and yes with data mining there’s a great world of possibility.”

Shaking up industries is part of Google’s DNA. Its self-driving car project could in theory eliminate the 1.24 million fatalities a year on global roads. If Page can soften a country’s fears about sharing our health data — which ends up saving lives — does the end justifies his means of fuzzy math?

“There’s tremendous opportunity if we start taking individualized genomic data and health histories and assuming you can perfectly de-identify it, my gosh, if you can mine that and look for patterns between genomic sequences and types of illnesses and effects of treatment on those illnesses you could potentially do a tremendous amount for society and the health of our individuals,” said Christopher Jaeger, Sutter Health’s chief medical information officer.

The average person might spend a few hours a year with their physician, during which data about their health (blood pressure, alcohol consumption, weight, etc.) is written down. If a patient’s health data was tracked 24-7 — as devices such as smartwatches are making realistic — there would be an exponential leap in the amount of data about someone’s health. More information — and the comparison of that information to other patients — should lead to better treatments.

“It would be great if when the patient walked in our Bluetooth sensors picked up their phone and it pushed in all their exercise and diet history, and then there were analytics that were performed in real time,” said Thomas Graf, chief medical officer at Geisinger Health System. “When the doc walked in the room they can say ‘Oh, looks like you’re exercising at 80 percent of what we were talking about.’ ”

But fear of litigation, privacy concerns, regulations and the challenge of collecting and standardizing data all stand in the way of realizing this health care utopia. Still, there are some early examples that hint at what could be done.

Researchers at the University of Bern in Switzerland have built a computer program to better measure the size of brain tumors.

Traditionally radiologists look at MRI scans and measure in two dimensions the size of a tumor. The computer program — called BraTumIA — is capable of a 3D analysis of the tumor’s volume, which better measures whether it’s shrinking or growing. Getting measurements right is crucial as physicians determine the best treatment plan for a patient.

“If I ask two radiologists to do the same job, you will see differences,” said researcher Mauricio Reyes. “The computer has the ability to be more consistent and more objective over time. Even if you have an error in the computer this error is consistent over time. What really matters is the trend.”

Here’s how the program works. A set of annotated brain scans — in which different parts of a tumor are labeled — are preloaded into the program. The program uses those as a guide to teach itself to identify different parts of future brain scans as a tumor or not.

The end result is being able to run a scan for five minutes on a laptop and having a better understanding of a tumor. If more medical images made their way into databases such as BraTumIA, those services would get even better.

But what if health data we think is anonymous gets identified or hacked? The threat of being sued deters health organizations from sharing data and embracing the full potential of data mining.

For example, MRI exams and CT scans of a patient’s head could be used to reconstruct a person’s face. A hacker with access to such a database could use face-detection software to crosscheck the scans with a Web site where users post photos of themselves.

“If the same person has a Facebook account there’s a good chance that you could identify this person. If I had access to such a database I could give you a list of people in Facebook with names of who has a brain tumor,” cautioned Bjoern Menze, a computer science professor at TU Munchen who researches medical imaging.

“There will be criminals. There will be people who are bad actors. At some point something is going to get out,” Graf said. “It’s not an irrational fear. At the same time, people die driving every year and we still choose to drive cars, or most of us do. It’s a risk every person has to decide where they fall on the line.”

Many of those I interviewed anticipated a situation where patients could decide whether to opt into data mining of their health records. A tax benefit might even be given to encourage involvement.

If health records are ever going to be data mined, it’ll happen when consumers are convinced the perks outweigh the costs. The world has already seen dramatic changes to privacy norms as services such as Facebook grow in popularity. It’s incredibly popular Newsfeed — which funnels the latest information about friends into a feed — was initially met with uproar by users concerned about their privacy.

But as users saw the utility of the feed, the tradeoff in privacy became acceptable.

“The goal in health care is not to protect privacy, the goal is to save lives. We need to have that as starting point,” said David Castro, director of the Center for Data Innovation. “Is the doctor treating me based on the last couple patients he saw, or is he treating me based on the rigorous analysis of millions of patents and finding the 5,000 that are actually just like me, and treating me in a much more accurate way?”

For data mining to succeed would also require recruiting top data scientists to health care, which isn’t easy given the demand in the hot field.

“It’s hard,” said John Weinstein, chair of bioinformatics and computational biology at MD Anderson Cancer Center. “You really have to battle with Silicon Valley and the Boston academic scene.”

“Why would someone who is really really good at analyzing data come to work for a health care organization and make X dollars when they could go to Google and make 10X dollars?” Marko added.