Last week I posted the above picture from the NYU SMaPP lab that showed how Democrats and Republicans used different words when they were tweeting in the 24 hours surrounding the shut down. Now my colleague at NYU-Abu Dhabi, political scientist Adam Ramey, has used the last 1,000 tweets from members of Congress to estimate their ideological positions in both the House and the Senate. He explains:
To get at this, I did what is know as a “scrape” of Twitter accounts of both members of the US House and Senate.* This allows us to know exactly what words each member of Congress uses and how often they use them. We can apply multidimensional scaling techniques to try to extract what is known as “latent dimensionality” in the data. Put simply, perhaps we believe that liberals might use some words more often than conservatives or that moderates use some words more than extremists. What scaling techniques seek to do is see whether a whole slew of words separate legislators based on some unobserved, underlying factors. For example, words like obamacare, debt, Benghazi, and defund may be used to convey different individual messages, but at the end of the day they are all “conservative” words – they are words used almost exclusively by conservatives.**
Here is the result of Ramey’s analysis. First, we see the House, with Republicans in red and Democrats in blue (the words themselves are in grey). Note that the X-axis and Y-axis contain labels (partisan vs. constituency focused; liberal vs. conservative). These are applied by the analyst — in this case, Ramey — and are not something determined by the model, which simply identifies the existence of the two dimensions of “underlying factors” to which Ramey refers.***
What is immediately clear is the gap between the Democrats and Republicans in the House. There is quite a bit of space between the two parties, and very little overlap. Here, by contrast, is the Senate:
A very different picture indeed! While there are of course still important differences between the members of the different parties, the gap is much smaller than in the House and there are some areas of overlap. In fact, Senator Thad Cochran of Mississippi, a Republican, actually clusters very close to the Democrats. And keep in mind, these are the actual words of the members of Congress themselves (or their staff members who tweet for them) with no filtering by the news media. So even as we look at exactly what the members of Congress want us to see, we find the Senate apparently a much more congenial space for compromise than the House.
*I took the last 1,000 tweets by each member and then applied some standard cleaning techniques to the raw data (i.e., removing punctuation and stop words – the, and, but, for, to, etc.). I also “stemmed” the document – that is, I reduced each word to its essential stem by removing suffixes. Stemming is used by text analysts because English words that share a common root are often talking about the same thing. For example, the words education, educate, educational, and educators all have the same root, educ, and are generally talking about the same thing. Thus, rather than treat all of the words is different, we stem the word and go with that.
**The specific scaling technique I used is called correspondence analysis. The technique takes a contingency table – in this case, a matrix of word counts (legislators on the rows and words on the columns) – and decomposes it using the singular value decomposition. Some preprocessing is done to account for the idea that some legislators tweet more often than others and some words are used more often than others. In what follows, I removed the root hous from all US House tweets and sen and senat from the Senate tweets, as these are used by almost all members and do not convey any information.
***The first dimension (x-axis) seems fairly obviously to be some form of ideology. As for the second dimension, Ramey writes:
What is the second dimension? That’s a bit harder to discern, but it appears that for the House that it is a partisan grandstanding vs. constituency focus. We might call this the “style” of the Congress member. Individuals with negative scores (see Figure 2) on the second dimension are talking about their constituents and local events (Robert Hurt, Sanford Bishop, Mike McIntyre). Members on the positive side are rehearsing partisan talking points (Nancy Pelosi and John Boehner).
My personal take is that while this may make sense for the House, it it not quite as clear that this is the correct characterization of the y-axis in the Senate.