It always sounded like a pipe dream, but in the past week, there’s been considerable buzz that Google might indeed be considering such a thing. The reason is that a team of Google researchers recently published a mathematics-heavy paper documenting their attempts to evaluate vast numbers of Web sites based upon their accuracy. As they put it:
The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
As our friends at The Intersect note, this does not mean Google is actually going to do this or put in place such a ranking system for searches. It means it’s studying it.
Indeed, Google gave us the following statement: “This was research — we don’t have any specific plans to implement it in our products. We publish hundreds of research papers every year.”
It’s not the company’s first inquiry into the realm of automating the discovery of fact. The new paper draws on a prior Google project called the Knowledge Vault, which has compiled more than a billion facts so far by grabbing them from the Web and then comparing them with existing sources. For 271 million of these facts, the probability of actual correctness is over 90 percent, according to Google.
The new study, though, goes farther. It draws on the Knowledge Vault approach to actually evaluate pages across the Web and determine their accuracy. Through this method, the paper reports, an amazing 119 million Web pages were rated. One noteworthy result, the researchers note, is that Gossip sites and Web forums in particular don’t do very well — they end up being ranked quite low, despite their popularity.
Indeed, when comparing this new method, dubbed “Knowledge Based-Trust,” with the traditional Google PageRank approach — which focuses on links — the researchers found that “the two signals are almost orthogonal.”
Google’s new research didn’t explicitly mention how this approach might rank science contrarian Web sites. But media have been reporting this week that climate change skeptics seem unnerved by the direction that Google appears to be heading.
If this ever moves closer to a reality, then they should be. If you read the Google papers themselves, for instance, you’ll note that the researchers explicitly use, as a running example, a fact that has become “political.” Namely, the fact that Barack Obama was born in the United States.
From their Knowledge Vault paper, for instance:
For example, suppose an extractor returns a fact claiming that Barack Obama was born in Kenya, and suppose (for illustration purposes) that the true place of birth of Obama was not already known. … Our prior model can use related facts about Obama (such as his profession being US President) to infer that this new fact is unlikely to be true. The error could be due to mistaking Barack Obama for his father (entity resolution or co-reference resolution error), or it could be due to an erroneous statement on a spammy Web site (source error).
And now from the new paper:
In our example, there are 12 sources (i.e., extractorwebpage pairs) for USA and 12 sources for Kenya; this seems to suggest that USA and Kenya are equally likely to be true. However, intuitively this seems unreasonable: extractors E1 − E3 all tend to agree with each other, and so seem to be reliable; we can therefore “explain away” the Kenya values extracted by E4 − E5 as being more likely to be extraction errors.
And thus, before our eyes, algorithms begin to erode politicized disinformation.
Substitute “Barack Obama was born in the United States” with “Global warming is mostly caused by human activities” or “Childhood vaccines do not cause autism,” and you can quickly see how potentially disruptive these algorithms could be. Which is precisely why, if Google really starts to look like it’s heading in this direction, the complaints will get louder and louder.
I say bring them. The late Sen. Daniel Patrick Moynihan famously observed that “Everyone is entitled to his own opinions, but not to his own facts.” The problem in the U.S. over the past decade in particular, however, is that everyone does seem to have his own facts — at least around certain politicized topics.
But if anyone can bring us back to a shared reality, well, it’s Google.