If you ever wondered how the vocabularies of the world’s elite rappers stand up to literary heavyweights such as William Shakespeare and Herman Melville, your wait is over. There’s an interactive graph for that.
Matt Daniels, a self-described designer, coder and data scientist, has given hip-hop the Five Thirty Eight treatment, quantifying wordsmithing with his unique methodology:
He used the first 5,000 words from seven Shakespeare works: “Hamlet,” “Romeo and Juliet,” “Othello,” “Macbeth,” “As You Like It,” “Winter’s Tale” and “Troilus and Cressida.” To compare rappers to Herman Melville, he analyzed the first 35,000 words from “Moby Dick,” offering this explanation:
Literary elites love to rep Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.
I decided to compare this data point against the most famous artists in hip hop. I used each artist’s first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake.
Daniels analyzed 85 individual artists or groups (and, in the comments section of i09, responded to critiques by saying that he will add more). One of his favorite groups, Outkast, is fairly prolific. (He’s dedicated an entire Web site to explaining them in charts and graphs, perhaps the most wonkish assessment in the history of hip-hop.) In his Shakespeare analysis, Daniels identified 5,170 unique words; Melville clocks in at 6,022. Outkast had 5,212. Also in Shakespeare’s neighborhood? Nas, the Beastie Boys, MF Doom and E-40. But the top honor goes to Aesop Rock, who delivers a stunning 7,392 unique words. Once you get above 5,700 words, the graph is dominated by people or groups widely considered the thinking man’s rap artists: Wu-Tang Clan (5,895), its members Ghostface Killah (5,774), GZA (6,426), and RZA (5,905), and one affiliate, Killah Priest (5,737). The Roots also make an appearance in the verbal stratosphere with 5,803 words.
It’s not a comprehensive assessment, by any means. Much like their real-life representation of hip-hop, women are sparse. So far, Missy Elliot (3,874), Nicki Minaj (4,162), Lil Kim (4,474) and Salt-N-Pepa (3,612) are the only female artists whose lyrics Daniels has analyzed, which leaves out Trina, Foxy Brown, Eve, MC Lyte, Queen Latifah and Jean Grae. Lauryn Hill seems like an obvious choice, but 35,000 words is the equivalent of three to five albums, Daniels says. Hill only has two solo albums: “The Miseducation of Lauryn Hill” and “MTV Unplugged 2.0.”
And the system arguably has its flaws. Daniels said:
I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king s—), featured vocalists, and repetitive choruses.
Still, it’s fun to check out where your favorite artist ranks, or who’s beating whom in the race to the bottom. So far, DMX has that distinction sewn up with 3,214 words. Maybe less growling and more verbiage on the next album, hmm, DMX?