Christian Rudder, co-founder of the popular dating site OkCupid, has a résumé that itself sounds like a fictionalized dating profile. Besides starting a successful Internet company (sold to in 2011 for $50 million), he’s the guitarist in the indie-pop band Bishop Allen, a movie actor (“Funny Ha Ha”) and a Harvard grad with a math degree. Throw in a penchant for long walks and cooking paella, and he’d be the most dateable man in America.

Now he can add “author” to his profile. His book, “Dataclysm: Who We Are (When We Think No One’s Looking),” builds on the popular OkTrends blog, which Rudder ran at OkCupid and which addressed questions of world-historical importance such as “How should you shoot your profile photo to get maximal interest?” (no flash, shallow depth of field) and “How do heavy Twitter users differ from other OkCupid members?” (they masturbate more frequently).

In “Dataclysm,” Rudder has grander goals. People on the Internet are constantly (and mostly willingly) sloughing off flakes of information. The resulting global cloud of informational cruft, Rudder says, makes possible a completely new way to do social science — to figure out, as he puts it in his subtitle, “who we are.” Yes, computers don’t understand humans very well. But they have their own advantages. They can see things whole that human eyes can handle only in part. “Keeping track is their only job,” Rudder says. “They don’t lose the scrapbook, or travel, or get drunk, or grow senile, or even blink. They just sit there and remember.”

That’s great if you’re a scientist or a monetizer of data trails. But the humans under study might quail a little to know, for instance, that OkCupid keeps track not only of what messages you send to your potential dates, but of the characters you type and then erase while you compose your little satchels of intriguingness. A beautiful scatterplot (the book is absolutely loaded with beautiful scatterplots) maps the messaging landscape. On one side of the plot you find the careful revisers, who draft and delete, draft and delete, typing many more characters than they eventually send. On the other side are those messagers who type fewer characters than they send. How is this possible? Because these are the copypasters, the diligent dates who see romantic approach as an opportunity for digital-age efficiency, sending identical “Hi there” blurbs to dozens of potential mates. It’s courtship in the age of mechanical reproduction.

Rudder has been quite open about OkCupid’s practice of experimenting on its customers, to the consternation of some. (At one point, the service started offering users matches that the algorithm secretly thought were terrible, just to see what would happen.) Experiments such as this are inherently deceptive; in Rudder’s view, they’re worth it, thanks to the opportunity they offer to study human behavior in the wild. He returns repeatedly to the theme that his data — which tracks what we do, not what we say we do — is a surer guide to our interiors than questionnaires or polls. People may say, for example, that they don’t have racial preferences in dating. But the data from OkCupid messages shows quite starkly that people are apt to contact romantic prospects from their own racial group. And it suggests that the real racial divide, as far as online dating goes, isn’t between white and non-white, but between black and non-black. “Data,” Rudder says, “is about how we’re really feeling,” unmediated by the masks we wear in public. That strikes me as too strong; I think most of us are still performing, even when we think no one’s watching. It’s masks all the way in. But it’s undeniable that Rudder and his fellow data-holders can see and analyze behavior previously invisible to science.

The material on race — perhaps because race is hard to talk about in public — is some of the strongest in the book. Rudder provides lists of phrases that are strongly preferred, or dispreferred, by whites, blacks, Latinos and Asians in their OkCupid profiles. The least black band in the world, it turns out, is Scottish indie-pop outfit Belle and Sebastian. (Caveat: I’ve seen Rudder’s own band play live, and I think it has to be in the running.) The lists are full of curiosities. Asian men are strongly inclined to put “tall for an Asian” in their profiles, in keeping with stereotypes about short stature being a dating liability for men. But Asian women also have “tall for an Asian” on their list of most-used phrases — why?

Rudder argues that hopeful singles are asking the wrong questions of their dates, focusing on topline items such as politics and religion, when subtler questions are more predictive. He observes that in three-quarters of OkCupid dates that eventually became committed relationships, the two partners gave the same answer to the question “Do you like scary movies?” That sounds impressive! But without more information, it’s hard to know exactly what to make of it. Horror movies are pretty popular. If, say, 70 percent of people like them, you’d expect 49 percent of couples (70 percent of 70 percent) to both say “yes” to that question by pure chance, and 9 percent (30 percent of 30 percent) to both say “no” — so you’d have 58 percent of couples agreeing, even if a taste for gorefests was completely unrelated to romantic capability.

I had a few other quibbles like that. But the reason I had quibbles is that Rudder’s book offers you something to quibble with. Most data-hyping books are vapor and slogans. This one has the real stuff: actual data and actual analysis taking place on the page. That’s something to be praised, loudly and at length. Praiseworthy, too, is Rudder’s writing, which is consistently zingy and mercifully free of Silicon Valley business gabble.

Rudder compares his project to Howard Zinn’s “A People’s History of the United States.” The comparison took me by surprise, but it makes sense. Like Zinn, Rudder is looking for a social science that foregrounds aggregates, instead of individuals, and attends to subtle social movements that might not be visible to any single person. But “people’s history” has two meanings. It’s history of the people but also history by the people; a kind of investigation that’s not restricted to academics and experts. That’s the big question for the new social science of datasets. It’s clear we’re now all part of the study. Can we develop a people’s data science that allows us all to be the scientists, too?

Jordan Ellenberg is a professor of mathematics at the University of Wisconsin and the author of “How Not to Be Wrong: The Power of Mathematical Thinking.”


