The Washington PostDemocracy Dies in Darkness

This is how Twitter’s new anti-harassment filter works. (Surprise! It works really well.)

I have never been so happy to see a Twitter notification. (Twitter)

Twitter has taken its biggest step yet against harassment. And here’s the incredible thing — cue parades, confetti, brass bands, etc. — this new step actually works.

Twitter is calling it a “quality filter,” and it’s been rolling out to verified users running Twitter’s iOS app since last week. It appears to work much like a spam filter, except instead of hiding bots and copy-paste marketers, it screens “threats, offensive language, [and] duplicate content” out of your notifications feed.

[Is spam free speech?]

On Tuesday — with the help of some sockpuppets and obliging Twitter friends — I ran a little experiment to figure out exactly what that means and how well it works. First I registered a bunch of throwaway Twitter handles under fake e-mail addresses and tweeted rude things to myself. (This is, FYI, a process that takes roughly 60 seconds per account — which is why Twitter’s struggled to control it.) Later, I asked my followers to tweet mean things to me, even persuading one of my very nice co-workers to tweet that she was going to kill me.

In pretty much every case, Twitter’s quality filter blocked the outrageously obscene and threatening tweets, while still allowing the merely critical and disagreeable in. It blocked, for instance, various one-word expletives, comments on my appearance, all tweets (offensive and not) from my obvious troll accounts, spammy messages from some guy who tweets me the same thing all the time, and my co-worker’s promise to “kill you” (me). It did not, however, block more benign criticisms of my work, or the words “rape” or “kill” used in a news setting.

In other words, the quality filter — like the thing protecting your inbox from spam — is pretty sophisticated; it’s not perfect, and it’s not a cure-all, but it does more than just skim out tweets with swear words. Added bonus: The offending tweeter is never notified, and doesn’t know his messages have gone unread. Also, the tweets aren’t deleted or blocked or otherwise “censored.” They’re still available on the tweeter’s public feed, and even in the recipient’s notifications tab on desktop. This essentially just gives victims an easy, technological way to avoid truly nasty messages if they want.

Unfortunately, the game-changing tool is available to less than 1 percent of Twitter’s userbase, thus far; Twitter declined to provide a timeline on when the feature could roll out more widely, saying only that the feature “is currently rolling out to verified users on iOS.”

The filter will also need some refining, as these things always do; we arguably don’t want to miss all tweets from all accounts just because they’re new. (I found, for instance, that the quality filter blocked even messages like “hi” when they came from a troll account.) Critics have also noted that the feature won’t work so well for users fielding very serious threats, who may need to read threatening tweets in order to protect themselves.

Still, it’s worth noting — and maybe celebrating — Twitter’s most substantial move against abuse on its network. Since admitting, in an internal memo in February, that Twitter “sucks at dealing with abuse and trolls,” CEO Dick Costolo has overseen a number of changes — including a new feature that makes it easier to report abuse to police, a streamlined reporting process and an effort to eliminate serial trolls by tracking their phone numbers. But critics have called those changes cosmetic, and plenty of high-profile Twitter-users — including the U.S. ambassador to Libya and actress Ashley Judd — have since fallen victim to abuse.

[Twitter CEO finally admits the obvious: Site has failed users on abuse]

The quality filter finally looks like it could change that, though it’s currently reserved for a privileged few. When Twitter makes quality filter available to everyone, we’ll know the site is really serious about dealing with abuse.

Liked that? Try these!