This week researchers demonstrated that by analyzing a person’s Web searches they could in some cases predict an upcoming diagnosis of pancreatic cancer.
The team of researchers aren’t pancreatic cancers experts, but computer scientists at Microsoft. Unlike traditional medical professionals, they have the advantage of access to a trove of data that Microsoft collects through its search engine, Bing.
The Microsoft researchers identified Web users who had recently searched for queries indicating they have pancreatic cancer, such as “I was told I have pancreatic cancer, what to expect,” and then looked back months earlier to examine patterns in the symptoms that the users searched for. This included phrases such as “dark or tarry stool,” “abdominal swelling,” “dark urine” and “yellowing skin.”
From this analysis they realized trends in the queries of users who were soon to be diagnosed with pancreatic cancer, identifying 5 to 15 percent of cases with low false-positive rates. The research was published in the Journal of Oncology Practice.
“This is really a big step forward and I think it’s exciting research,” said Robert Grossman, a professor of medicine at University of Chicago and director of the Center for Data Intensive Science. “Finding low cost, low risk, high coverage health surveillance systems is an important challenge.”
But Alison Patricia Klein, a professor of oncology at Johns Hopkins University who studies pancreatic cancer, cautioned that there are limits to how useful the early detection through Web searches can be for pancreatic cancer, because it’s generally too late for a patient once symptoms emerge.
“It’s interesting that they can pick out a portion of individuals that might have pancreatic cancer,” Klein said. “My concern is, for many of the people it’s picking up, they’re in the later stages of the disease where we don’t have very good therapies.”
She also warned that analyzing digital data would not necessarily deliver a representative sample of all individuals with a medical condition. Low-income patients tend to have less access to digital technology, so they will create less data to be analyzed. This is problematic because health-risk factors can differ across the population.
For Grossman, the Microsoft research demonstrates the power this new data can have, but a big hurdle is how to implement such an approach.
“There’s a lot of tough questions that need to be worked out,” Grossman said. “Research needs to be done on how best to design and deploy search-based surveillance systems, while [receiving consent from] the users and respecting their privacy.”
As consumers spend more time using digital services, ranging from Fitbits to smartphones and search engines, a wealth of new and potentially useful medical information is being created. And that data is largely in the hands of a new class of players such as Microsoft rather than the traditional health providers.
The Microsoft researchers acknowledge this challenge in their paper. A surveillance system would need to convey the uncertainty linked to detection while also considering the search engine’s liability, and the anxiety the news may generate in a Web searcher. They mention the possibility of alerting physicians separately from patients.