What does Twitter say about your diet? According to an analysis of 50 million tweets, Mississippi loves cake, Virginia can't get enough bacon, and Colorado likes chocolate bars. And how do we burn off those calories? Judging by those same tweets, Colorado runs a lot, Virginia swims a bit, and Mississippi likes to dance.
A new online, interactive instrument built by researchers at the University of Vermont is using Twitter to count how many calories Americans consume and expend. Their tool, dubbed the Lexicocalorimeter, looks through tweets for food and exercise-related words, like doughnut and treadmill, and runs them through a basic algorithm that ranks the words by their frequency and caloric implications.
The algorithm then uses a simple ratio of calories consumed to calories burned to calculate each state's caloric balance. Based on tweets made from the continental states during 2011 and 2012, the researchers found that Mississippi expended the fewest calories, with Colorado burning the most.
“In many of the states where obesity rates are the highest, the calories being consumed is a lot higher than the calories being burned,” says Chris Danforth, an applied mathematician and assistant professor at UVM. But while the algorithm's results are in line with public health data, Danforth acknowledges that Twitter represents a limited sample size -- so the Lexicocalorimeter is no replacement for public health surveillance. Rather, it's a tool that complements more traditional measures of health.
“We certainly don't know how long they're running or how many hot dogs they're eating, but from a higher level looking down on Earth you can see what's going on with people's health,” Danforth says. He likens the Lexicocalorimeter to early versions of Google Flu Trends, a service from Google that estimates influenza activity based on Google searches of terms like “flu” “cold” and “sick.” Google Flu Trends, while nowhere close to predicting influenza outbreaks, has drawn interest from public health authorities in many countries.
Like Google Flu Trends, the Lexicocalorimeter's algorithm has been calibrated to eliminate false positives. The word apple, for example, can mean more than just the food. “If it's a food usage, we assign a calorie for it. If it's the company we don't,” explains Danforth, who together with Peter Sheridan Dodds and an interdisciplinary team published an early (and not yet peer reviewed) version of a study explaining the Lexicocalorimeter on the scientific preprint site arXiv.
“Twitter is really useful for learning what people are talking about, and what people are doing," says Mark Dredze, a researcher at Johns Hopkins who studies social media and health who was not involved in the study.
"Exploring that is the first stage," Dredze says. "The second stage is developing better algorithms for the types of questions being asked in public health and determining who in public health will benefit from this information."
Fine-tuning these algorithms is key to improving large-scale analysis of social media, whether the goal is to measure the caloric content of a tweet or to find the next developing news story. These technologies represent new ways of finding and understanding the conversations we're having as a country -- chatter that is increasingly moving online.
And developing tools like the Lexicocalorimeter is just plain fun. “We can make maps of the U.S. based on how often people talk about rock climbing or eating kale or bacon,” says Danforth. “It's a way to explore our culture.”
Aleszu Bajak is a freelance journalist covering science across the Americas. A former producer for "Science Friday" and Knight Science Journalism Fellow at MIT, he now teaches journalism at the Media Innovation Program at Northeastern University in Boston.