Yet television news has received far less attention than online news because of the difficulty of monitoring broadcast content. While a few lines of computer code is enough to scrape an online news website, monitoring a television station requires extensive hardware and specialty software.
But now Internet Archive has met these challenges with its TV News Archive, which has preserved nearly 2 million hours of television news from January 2009 to the present. This data include more than 5.7 billion words of closed captioning from, at the maximum, more than 150 distinct stations. All of it is publicly available.
Over the past year and a half, my GDELT Project has worked closely with the Internet Archive to visualize how U.S. television networks have covered the tumultuous 2016 election cycle. One of our tools, the 2016 Candidate Television Tracker, tracks how many times each candidate is mentioned daily and has been used by media outlets such as the Atlantic, The Washington Post, FiveThirtyEight, Politico and the Guardian, among others.
This past December, we expanded this tool into our new Television Explorer that allows you to search any keyword or phrase and get back a seven-year timeline of how often it has been mentioned on each station monitored by the Internet Archive. The results are easily downloadable for further analysis.
This tool turns up some fascinating gems. For example, well before President Trump made distrust of the news media a theme of his 2016 campaign, you can see that attitude among his supporters. This 2011 clip from MSNBC shows a Trump fan being asked, “Where do you get most of your news?” and answering: “You can’t get it from the media because they are part of the problem. … [I get my news] from my neighbor; he gets it off the computer.”
One of the most powerful features of the new tool is its ability to compare how the different national networks have covered key topics in the 2016 race.
For example, the graph below shows the percentage of all sentences spoken on each of the national networks monitored by the Internet Archive from Jan. 1, 2015, to Jan. 23, 2017, that mentioned “Clinton” within four sentences of “email,” “emails” or “server.”
Trump’s infamous “grab them by the p—-y” statement also received a whopping 274 percent more attention on CNN than on Fox News, although MSNBC also prominently featured the statement.
This demonstrates the strong polarization of television coverage of the race. The behavior of Fox News and MSNBC is predictable. More surprising, perhaps, is how much attention CNN devoted to stories potentially detrimental to Trump, like Russian hacking and the Access Hollywood tape.
Here are some other findings. The Black Lives Matter movement received more than twice as much attention by Fox News than CNN, while “immigration” received relatively equal coverage. The language used to refer to these issues also differed across networks, with CNN using the term “illegal immigration” while Fox News used the term “illegals” more than three times as frequently as any other network. The label “snowflakes” was almost exclusively limited to Fox News.
Most recently, over the past two months, CNN has paid nearly five times as much attention as Fox News to Trump’s calls to move the U.S. Embassy in Israel from Tel Aviv to Jerusalem.
All of this shows how the Internet Archive’s incredible repository of television news can be combined with tools that transform it into data. We can for the first time explore key questions about the media that do not rely on small samples but on the entirety of all cable network broadcasts over the better part of a decade.
Kalev Leetaru is a senior fellow at the George Washington University Center for Cyber and Homeland Security.