Almost exactly three years ago, I attended a conference at Harvard and MIT that brought together two groups not very used to talking to each other — fact checkers and those who study the problem of misinformation in U.S. politics, and computer scientists, programmers and thinkers from the tech community.
The goal was to “understand and address propaganda and misinformation in the new media ecosystem.” And I won’t soon forget how refreshing it was to hear engineers and techies talk about their ideas about how to design programs to fight falsehoods in real time. There was a feeling, basically, that where fact and reason had failed to convince people that global warming is real, or that vaccines are safe, maybe new apps could come to the rescue.
It’s in this context that I think about the amazing strides made by Google engineers, whose mere research paper outlining how they might automate the identification of factually accurate Web sites has touched off a storm. Google isn’t programming its searches to do this — it’s merely studying the matter. But I suspect that in 2012, it would have been the star idea of the conference.
I wrote about Google’s research yesterday and was on MSNBC’s “All in With Chris Hayes” last night to discuss it (video below) — and, as a result, I’ve been thinking a lot about this and what it means. There seems to be a fair amount of confusion, and some critical points that I think people are missing. In a sense, Google is both closer to, and farther from, doing this than people seem to realize.
Google told me — and Chris Hayes — that it is just studying ways of assessing the quality of Web sites based on their accuracy. Google’s actual statement is the following: “This was research — we don’t have any specific plans to implement it in our products. We publish hundreds of research papers every year.”
At the same time, however, it’s clear that Google is interested in adding factual information to searches. It has implemented a feature called Knowledge Graph, which does just that. Google “Charles Dickens,” and a box on the right side of the screen gives you biographical information, a list of published works and more. And — more relevant to debunking falsehoods — Google has just started providing carefully vetted medical information when you search for medical topics, in the same format.
“One in 20 Google searches are for health-related information,” explained a Google blog post announcing this feature. “And you should find the health information you need more quickly and easily.”
Moreover, the new research about how to rank of sites based on accuracy — which Google’s engineers have just shown is technically feasible in their much discussed paper — draws upon Google’s incredible Knowledge Vault. This is a repository that has pulled 2.8 billion “facts” from the Web and probabilistically ranked their likelihood of being true by cross-referencing them with other sites and sources. Several hundred million of these “facts” get a greater than 90 percent probability ranking of being accurate and true (in the researchers’ model, to be sure).
This is incredible — and, yes, algorithms and machine learning are doing this. With human programmers, to be sure, behind it all (watch one of them give a lecture about Knowledge Vault here).
As these examples show, Google is already in the “fact” game. However, at the same time, Google is also contemplating a lot less than what some climate “skeptics” seem afraid of.
Remember, this all stems from a research paper published by Google engineers, using sophisticated algorithms to try to see whether they can automatically estimate the trustworthiness of Web sites. And this was done by comparing the “facts” those sites contain with a gigantic existing database, Google’s Knowledge Vault.
But what kind of “facts” does Knowledge Vault contain? The Google researchers call them “knowledge triples.” The reason is that, in the researchers’ words, they take the form “subject, predicate, object” — e.g., “Barack Obama, nationality, USA.” (This is the real example the researchers give.)
As I told Hayes last night, this is what you might call a “simple fact” — which is presumably why large volumes of them can be processed by computer programs. It’s a fact that can be expressed in the following way: The x (nationality) of y (Barack Obama) is z (American).
That’s very different from what you might call a sophisticated or nuanced fact: “Most of the global warming in the last 50 years was caused by human beings, to a high degree of certainty.” I’m not saying engineers can’t find a way to automate the recognition of these more complex facts as well — I’m betting that they can. And they could certainly automate the search for credible scientific sources — like the National Academy of Sciences — or citations.
But for now, evaluating the veracity of more complex statements would still seem to rely on human discernment and judgment.
All of which is another way of saying, if Google gets in the truth game, then people like birthers — not climate change skeptics — are the ones who should worry first. Because birthers literally believe in an incorrect triple. Their error is so stark that a program can evaluate and discard it.
The technique so far is limited, then — but the fact that it can apparently detect whether a site contains birther misinformation shows that it is already politically relevant.
It’s a really exciting and brave new world, where big data and smart engineers can indeed, in theory, build tools and apps to help protect us from misinformation. What remains up in the air right now is precisely how, and if, these will be implemented.
While Hayes says he’s worried about Google determining what’s “true,” I have to say that I feel much more positive about it, in the following sense. We know by now that arguing the science with someone who’s wrong doesn’t work — and neither does citing the best experts. Minds are too closed, and humans are all too human.
So maybe we do need the help of engineers, rather than just scientists and experts, to make our world just a little more truthful.