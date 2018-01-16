

President Trump speaks during a news conference in the East Room of the White House. (Evan Vucci/AP)

This is not a story about President Trump. It is a story about computers.

Well, to be fair, it’s about both. It is about how an automated process might try to replicate the way Trump speaks and writes — and why that’s harder than it might seem.

As a baseline, consider this. If you wanted to, you could probably generate a fairly realistic-sounding Trump sentence off the top of your head. Something like: “The fake news media won’t report on the great numbers from the economy, sad!” Easy enough. Were I to present that to you as something Trump said (which, to my knowledge, he didn’t), you would probably not question that assertion.

If I presented, on the other hand, something such as: “Tomorrow Wisconsin! I congratulated him information — I want chain migration, lottery system, right? And Wisconsin or obstructionist. We will be discussing. Donald Trump administration, especially in a lot to do it”?

… I think you’d be more skeptical.

And rightly so. That bit of text was generated using an automated tool called a Markov chain generator. Markov chains are more complicated than I need to get into here, so let me just explain what the generator does.

The team at Factba.se has been compiling every speech, tweet and interview from Trump as president and generously shared all that text with The Post. With that in hand, I was able to do a simple calculation: When Trump said x, what word y was most likely to follow?

The most common combination of words Trump used from his inaugural speech through last week was “going to.” The word most likely to follow “to” was “be.” The word most likely to follow “be” was “a.” Then “lot.” Then “of.” “The.” “United.” “States.” And so on. String those together and you get a sentence — “Going to be a lot of the United States” — that Trump actually never said.

Because it … doesn’t make much sense. There’s a concept in computer animation with which you may be familiar called the “uncanny valley.” The idea is that as you improve your visual representations of people to look more and more realistic, you eventually reach a point where the image is much more realistic than, say, a painting, but is somehow … off. It looks human-ish, but there is some clear indicator that it was computer-generated. Something — maybe in the eyes — gives it away. So we accept a low-res pixelated video-game character as sufficiently human for our purposes but reach a point where the image is too human to avoid noticing where it isn’t.

This is generally a function of technology. As computer effects improve, so, too, does our ability to create characters that seem legitimately human instead of bad digital copies. Our Markov chain generator has the same problem: It’s only a rough approximation of how a human (here, the president) speaks, and where it’s off, we notice.

That said, the tool gets closer to Trump than you might expect. Like: “Dangerous regime. Thank Melania & myself, thank everybody, but I think that determination.” That’s not totally out of the realm of something that might be in a Trump speech. His phrasings are both unusually familiar and unusually unpredictable, which, in this case, probably helps diminish the effect of seeming surreal.

To make this tool seem a bit more authentic (early iterations had a lot of gibberish such as “Economy is yet, they have to be very and I we have we are going we have to I think and absolutely and I I think it’s thank you have I won’t have”), I didn’t simply pick the next most likely word from the database of what Trump has said. There are three randomized options:

Picking from among the most common words to follow the original word (this happens 50 percent of the time)

Picking totally randomly from words that had followed the word (15 percent of the time)

Picking from among the longest words to follow the original word (35 percent of the time)

But, again, it’s still a rough approximation. Try it out. (The tool will try to build a sentence from the word or letters you enter; if it can’t match that word, it picks randomly.)

Naturally, I included an option to tweet the result.

The reason I undertook this was because I was curious about how close the result might be to something Trump would say. How predictable are his patterns of speech? The tool would no doubt be improved by considering more of the context around a word, such as by looking at the words that usually followed two-word phrases. As it is, though, it sometimes (though rarely) gets close enough to actually approximate the president in a believable way.

Does that say something about Trump? About language? About the sophistication of PHP code? Or is it just that I primed you to look for examples in which the bot sounds more like Trump and you’re hoping to fill that pattern?

Incidentally, if you keep going with the most common words to follow “going to,” you end up in an endless loop. “Going to be a lot of the United States of the United States of the United States of the United States of the United States of the United States of the United States,” and so on.

This, more than anything, sounds like a politician.