An interesting new analysis comes from the team at the Data Face, Jack Beckwith and Nick Sorscher. I asked them a few questions about what they found. Below is a lightly edited transcript.
How did you go about measuring media coverage of Hillary Clinton and Donald Trump?
We compiled a total of 21,981 articles written about the election dating back to July 1, 2015. To be included in our data set, each article had to reference either Donald Trump or Hillary Clinton in its headline (but not both). The articles came from the websites of eight major media outlets: the New York Times, The Washington Post, Chicago Tribune, Wall Street Journal, Slate, Politico, Fox News and the Weekly Standard. We wanted a mixture liberal and conservative outlets, at least according to conventional wisdom.
We looked at the number of articles that were published about each candidate over time, which captures their ability to dictate the news cycle. And using the actual text of the articles, we evaluated the tone of the coverage — how positive or negative it was toward each candidate — and how it has shifted throughout the campaign.
How do you know whether a story is positive or negative about either of the candidates?
We did this via a computer algorithm, which is becoming increasingly common as social scientists work with huge data sets of text. There are a variety of approaches to what’s often called sentiment analysis, but our methodology was this: for each article, the algorithm identified every adjective. Then, using a very large word bank, it scored the adjectives on a scale of -1.0 (most negative) to +1.0 (most positive). The computer then averaged those values to generate an overall sentiment score for each article.
This obviously isn’t perfect. A computer’s sense of sentiment can be tripped up by things like satire, slang or misspellings. But given that we were working with news articles (the Onion wasn’t among our outlets), we believe these concerns are less relevant. Moreover, sentiment analysis has been shown to be surprisingly effective in predicting the stock market, summarizing customer feedback and delineating a population’s political views.
Let’s talk about what you’ve found. First, how much coverage have the two candidates received in these outlets, and how has that changed over time?
The first thing that jumped out at us when we started examining our data was the sheer number of headlines in which Donald Trump’s name appeared. Across the eight outlets, we found Trump’s name mentioned in a total of 14,924 article headlines from July 1, 2015, to Aug. 31, 2016. Clinton has been mentioned in less than half that amount.
Both candidates’ mentions have increased over the course of the campaign, although the increase in Trump’s mentions occurred earlier and at a faster rate than Clinton’s. Before August, Trump’s peak month was March 2016, largely driven by the increased coverage surrounding his presumptive nomination.
And what about the tone of that coverage? Is one candidate getting “worse” coverage than the other?
The short answer is that it varies over time and appears to depend on the most salient events. For example, Donald Trump’s worst weeks may have come in November 2015, around the time he suggested the need for a national database of Muslim citizens and also mocked a disabled New York Times reporter. The tone of Clinton’s media coverage seems to have suffered most when her private email server scandal came to the fore once again in early July 2016.
More broadly, it seems that tone is more volatile now. Earlier in the year, from mid-December through early April, the tone of Clinton’s coverage was consistently more positive than Trump’s when averaged across all eight outlets.
After that, the tone of Clinton’s coverage swung from her most positive week over the entire campaign in early June to her most negative in the span of four weeks. The tone of Trump’s coverage, meanwhile, has tended to decline since mid-April.
But over the past several weeks — and our data goes through Sept. 13 — no candidate has had a consistent advantage.
What about the individual outlets? Are there differences in how they cover the candidates?
Yes, there are differences. For each media outlet, we took the articles in our sample and computed a median sentiment score for Clinton and Trump. We then took the difference between those scores as a way of gauging whether coverage of one candidate was more positive or negative than coverage of the other.
We found that all of the media outlets that we considered “liberal” treated Clinton more favorably. The more conservative outlets seemed more on the fence about Trump. In our sample of articles, only the coverage of Fox News was more positive toward Trump than Clinton, at least to a statistically significant degree. Coverage at Weekly Standard, Wall Street Journal and Chicago Tribune didn’t clearly favor one candidate or the other.
It strikes me that the levels of positive or negative sentiment are fairly modest. You scored articles on a scale from -1 to +1, but the averages over time or within outlets are pretty close to 0. So it seems like any “bias” — if we want to call it that — isn’t large.
Sentiment scores do fall pretty close to zero when you look at the data for each candidate over time or by media outlet. There were certainly individual articles that fell closer to the extremes, but those are drowned out by a large swath of more neutral articles. This makes sense: Our sample consists of news articles, which feature more factual information and relatively few “loaded” adjectives. (We calculate medians here to ensure that outliers don’t affect our results.)
That being said, we’re still able to pick up on subtle differences in tone because of our large sample size. We’ve gathered almost 3,000 articles from The Washington Post alone since July 2015. Even a difference of 0.012 between Clinton and Trump registers as statistically significant. Of course, we can debate its substantive significance.
Last question: What is The Data Face?
The Data Face is a website that I [Jack] started last October, during my senior year at the University of Pennsylvania. We’ve assembled a team of three “data journalists” who investigate topics in music, politics and sports. To date, we’ve tackled questions like which cities have produced artists with the most Billboard Hot 100 hits, where the 2016 Warriors rank in NBA history, and, of course, how positively (or negatively) the media has treated each 2016 presidential candidate. All of our articles rely on data that we collect ourselves and present in interactive visualizations.
We’re always on the lookout for interesting data sets to tinker with, so feel free to send us suggestions at firstname.lastname@example.org.