What can data reveal about international relations? That's the question that the Global Database of Events, Language, and Tone (GDELT) has sought to answer. By analyzing data from English-language news sources, they have compiled a huge database of people, organizations, locations, themes and events.
In a new project, GDELT researcher Kalev Leetaru has started making word clouds of world leaders, using GDELT data to show the top 100 names mentioned in articles about a specific world leader, and how often these names occurred. For example, the image at the top of the post shows the different names mentioned in articles relating to Toomas Hendrik Ilves, the president of Estonia, between April 1, 2013 and March 18, 2014. In that graph you can see that President Obama was the most mentioned in stories about Ilves, with Dalia Grybauskaitė, President of Lithuania, and Andris Berzins, President of Latvia, also mentioned frequently. New York Times columnist Paul Krugman, famously involved in a Twitter spat with Ilves, also makes an appearance.
Obama, as you might expect, is mentioned frequently. Here's his own word cloud, which is perhaps most notable for not including as many international names as the others:
In an e-mail, Leetaru said that the only name that rivaled Obama's for dominance was that of Russian President Vladimir Putin, who appears in the top 100 names of world leaders 84 percent of the time, while Obama appears 96 percent of the time (though in the top 10 percent of names mentioned, Putin only appeared 42 percent of the time while Obama appeared 90 percent of the time). What's so remarkable about Putin's dominance, Leetaru explains, is that this cloud is of English-language leaders: There should be a clear bias toward English-language leaders.
Here's Putin's own word cloud. Note how high former Ukrainian President Viktor Yanukovych and NSA whistler blower Edward Snowden appear.
Here are word clouds of some other world leaders. Where needed, we've provided a little context:
The names on Kim Jong-un's word cloud are a little unusual: For example, Adolf Hitler, Gynthew Paltrow, and "imprimir indicar," which appears to be a mistake.
The importance of the legacy of Hugo Chavez is made clear in the word cloud for Nicolas Maduro.
One of the biggest names mentioned in relation to Rwandan President Paul Kagame is Patrick Karegeya, a Rwandan dissident who turned up dead in South Africa on Jan. 1.
Pope Francis has an unusually broad selection of names mentioned.
As does Chinese President Xi Jinping, who has some unusual names on his list, including Clark Gable and Fred Astaire. Xi's name itself also doesn't appear, which Leetaru reasoned was probably due to different styles of transliteration.
Can we learn everything about these relationships with data? Of course not, and there are some important caveats to the data, not least is that the data has been collected only from English-language news stories. There also appear to be some anomalous results, and a lot of context needed to truly understand how this all works.
Even so, the word clouds are an interesting way to consider the data collected by the GDELT, and a great indication of the project's potential. Leetaru plans to add a tool to his Web site that will allow users to create their own word clouds. "That's where I see the power of big data," Leetaru said in an e-mail. "It is sort of like the ultimate research assistant – it can happily scour enormous volumes of material and give you the patterns it finds, like these word clouds, for the human to then interpret and make sense of."