Example of message designed to remind harassers of the humanity of their victims and to reconsider the norms of online behavior. (Screen shot from Twitter by Kevin Munger/TMC)

Despite rising concern among the public, social-media companies have had little success stemming the wave of online harassment. As part of research recently published in the journal Political Behavior, I conducted an experiment on Twitter to find out the best tactics people can use to discourage other users from using harassing language. I found that these sanctioning messages do have an effect, but not in all contexts.

Twitter is certainly aware of this problem. As CEO Dick Costelo said in an internal memo in 2015, “We suck at dealing with abuse and trolls on the platform and we’ve sucked at it for years.” They do suspend some of the more egregiously abusive accounts, and they have started implementing more sophisticated techniques like “shadow banning.”

However, there may be limits to the effectiveness of top-down efforts by companies that run social-media platforms. In the short run, heavy-handed sanctions like account bans can actually embolden users who are censored. There is excellent evidence that this happens in China when the regime employs censorship.

A better option might be to empower users to improve their online communities through peer-to-peer sanctioning. To test this hypothesis, I used Twitter accounts I controlled (“bots,” although they aren’t acting autonomously) to send messages designed to remind harassers of the humanity of their victims and to reconsider the norms of online behavior.

The use of an experiment allowed me to tightly control the context for sanctioning. I sent every harasser the same message:

@[subject] Hey man, just remember that there are real people who are hurt when you harass them with that kind of language

I used a racial slur as the search term because I thought of it as the strongest evidence that a tweet might contain racist harassment. I restricted the sample to users who had a history of using offensive language, and I only included subjects who appeared to be a white man or who were anonymous.

It was essential to keep the race and gender of the subjects constant to test my central question: How would reactions to my sanctioning message change based on the race of the bot sending the message?

To do so, I created two types of bots: white men and black men. To manipulate the race, I used the same cartoon avatar for the bots’ profile picture and simply changed the skin color. Using a method that has been frequently employed to measure discrimination in hiring, I also gave the bots characteristically white or characteristically black first and last names.

Here’s an example of “Greg,” a white bot:


“Greg”, the white bot used in the author’s study. (Screen shot from Twitter by Kevin Munger/TMC)

The picture at the start of this post is a screenshot of “Rasheed,” a black bot, in action.

To make the bots look more like real people, I followed some celebrities/news outlets and sent a number of harmless tweets (“Strawberry season is in full swing, and I’m loving it”).

I also varied the number of followers the bots had, to test the theory that “higher status” people are more effective at changing others’ behavior. To do this, I bought followers for half of the bots — 500 followers, to be specific — and gave the remaining bots only two followers each (see screenshot above). This represents a large status difference: a Twitter user with two followers is unlikely to be taken seriously, while 500 followers is a substantial number.

Overall, I had four types of bots: High Follower/White; Low Follower/White; High Follower/Black; and Low Follower/Black. My prediction was that messages from the different types of bots would function differently. I thought High Follower/White bots would have the largest effect, while Low Follower/Black bots would have only a minimal effect.

I expected the white bots to be more effective than the black bots because all of my subjects were themselves white, and there is evidence that messages about social norms from the “in-group” are more effective than messages from the “out-group.” Race does not always define in-group/out-group status, but because these subjects were engaged in racist harassment, I thought that this was the most relevant group identity.

The primary behavior I hoped to change with my intervention was the subjects’ use of racist slurs. I tracked each subject’s Twitter use for two months and calculated the change in the use of a particular racial slur.

Only one of the four types of bots caused a significant reduction in the subjects’ rate of tweeting slurs: the white bots with 500 followers. The graph below shows that this type of bot caused each subject to tweet the slur 0.3 fewer times per day in the week after being sanctioned.

Change in average daily slur use in the week following online sanctioning (Data and Figure: Kevin Munger)
Change in average daily slur use in the week following online sanctioning (Data and Figure: Kevin Munger)

Roughly 35 percent of subjects provided some personal information on their profile. The effects of my messages on this subset — that is, non anonymous Twitter users — were strikingly different. Tweets from white bots with 500 followers did not cause a significant change in these users’ behavior, but tweets from black bots with few followers (the type of bots that I thought would have a minimal effect) actually caused an increase in the use of racist slurs.

The messages were identical, but the results varied dramatically based on the racial identity and status of the bot and the degree of anonymity of the subject.

Overall, I found that it is possible to cause people to use less harassing language. This change seems to be most likely when both individuals share a social identity. Unsurprisingly, high status people are also more likely to cause a change.

Many people are already engaged in sanctioning bad behavior online, but they sometimes do so in a way that can backfire. If people call out bad behavior in a way that emphasizes the social distance between themselves and the person they’re calling out, my research suggests that the sanctioning is less likely to be effective.

Physical distance, anonymity and partisan bubbles online can lead to extremely nasty behavior, but if we remember that there’s a real person behind every online encounter and emphasize what we have in common rather than what divides us, we might be able to make the Internet a better place.

Kevin Munger is a Graduate Research Associate of the NYU Social Media and Political Participation (SMaPP) lab and PhD candidate in the Department of Politics at New York University.