Until recently, Patrick Juola was known primarily as the man who outed J.K. Rowling as the author of “The Cuckoo’s Calling,” a book she penned under another name.
Now Juola can add another high-profile outing to his resume. The computational forensic linguist, who has worked at the arcane intersection of language and computer programming since 1997, says Newsweek may have named the wrong man in its cover story about the founder of Bitcoin.
“This does let us say fairly confidently that [the founder of Bitcoin is] not Dorian,” John Noecker Jr., the chief scientific officer at Juola’s forensic analysis firm, Juola & Associates, told Forbes over the weekend. “If this were a [police] lineup, if everyone picks the same guy that doesn’t mean that guy is the perp. But if they’re picking the same guy and it’s not you, you’re not the perp.”
This probably sounds confusing — and it is, a bit. Juola’s forensic software runs so many micro-analyses of so many tiny, subconscious data points that it’s impossible to break them out one by one. ( In trials, the technology has had a success rate of up to 90 percent.) Individually, Juola says, these data points are meaningless: little bits of trivia, like how frequently a writer uses the word “too,” or whether he refers to a piece of furniture as a “sofa” or a “couch.”
But when compiled by the hundreds or thousands, these micro-patterns form an elaborate map of an individual’s dialectical and linguistic patterns, which can be used to disqualify candidates as the writer of a particular piece — and which, to the layman, looks a heck of a lot like magic.
Remember those eerily accurate dialect quizzes that went viral in December? This operates on more or less the same set of principles.
“The basic idea is that every word you say represents a choice,” Juola told me last week. “So when you speak, you’re making a million choices that you’re not aware of. The differences aren’t meaningful to the speaker.” But they are meaningful to a program that can catalogue and index all the possible options.
Even someone attempting to forge another person’s work couldn’t be cognizant of all those options, Juola says; in fact, it’s even difficult for forensic linguists to understand them. Juola says one drawback of this technology is that its outcomes aren’t always intuitive: when the program spits out an answer, based on a constellation of tiny patterns, you can’t point to one thing to explain it.
That said, computational linguistics is by no means foolproof. Co-authors and editors can garble the patterns that Juola’s program looks for. And the program only works when it’s choosing between candidates: for instance, it can look at writing samples from person A, B and C and determine which one was most likely to write a ransom note. But if person D actually wrote it, the computer would have no way of knowing.
That’s why, for this Bitcoin analysis, Juola & Associates ran writing samples from Dorian Satoshi Nakamoto — the man Newsweek outed — against (a) documents attributed to Bitcoin’s mysterious founder and (b) documents written by other people who have been named as possible founders of Bitcoin over the years.
Conclusion: Dorian was not the most likely writer. Neal J. King — a German programmer suspected by Fast Company in 2011 — was.
What about the popular Internet theory that Bitcoin’s founder is not one person, but a shadowy collective of people operating under the name? Juola is running that analysis today. We’ll update this post when it’s available. In the meantime, the curious might want to check out his white paper on how stylometry works and what its applications are: everything from Defense Department research to immigration trials, it turns out.