This kind of social media surveillance could have implications for drafting new health policies and interventions associated with e-cigarettes.
“We wanted to see what age group is using them and how,” said Paul Krebs, a clinical psychologist in the Department of Population Health who has studied tobacco for 15 years. More than 10 percent of Americans have now tried e-cigarettes, according to estimates that reflect the products' dramatic increase in popularity. But national surveys don't explain why, Krebs said. “Are they really attracted to the flavors? Are they thinking it's safer than smoking?”
Krebs and a team from NYU and the Simons Center for Data Analysis in New York built a database of words sent from Twitter accounts that followed some of the largest e-cigarette brands. After filtering by their target hashtags, they classified the resulting 13,146 tweets by hand and then programmed a computer to separate them out.
Their study established a system that automatically distinguishes the tweeted messages indicating e-cig use -- because they're about liking a certain brand, for example -- from those with marketing themes or product headlines.
“This could have direct clinical applications if we find out kids are really loving these flavors or thinking it's perfectly safe,” Krebs said. “But we do need to validate that what [people are] actually saying is what they're actually doing.”
The team presented its study recently at the Pacific Symposium on Biocomputing, a conference that brings together researchers applying computational methods to biology and peer reviews their studies. This was the first year the meeting has included studies on social-media monitoring for public health surveillance. Previous studies have tracked the explosive popularity of e-cigarettes on Twitter and have looked at manufacturers' pervasive marketing tactics there.
Extrapolating public health data on e-cigarette use from a collection of tweets can be precarious, of course. “But that's not necessarily a bad thing,” says Michael Paul, an assistant professor of information science at the University of Colorado, Boulder who has tracked topics like air quality, influenza and bath salts on social media. “Because these products are so new and government-run surveys take a few years to catch up, researchers are still trying to figure out the landscape,” he said. “There's this time lag with national surveys that the social media can fill in.”
Chris Danforth, an applied mathematician at the University of Vermont who has built similar tools to scrutinize social media for public health data purposes, agrees that Twitter has a role with topics for which it's hard to get traditional survey data. But there's a lot of noise to filter out, he cautions.
Judging from his own work tracking vaping -- in which an e-cigarette's vapor is inhaled and then exhaled -- Danforth says e-cigarette tweets are heavily dominated by robotic spam and marketing bots. Up to 80 percent of tweets have one or the other or both, he said, and Krebs and his team didn't filter out enough of them.
“They weren't as careful as they could be,” he said. “We found that testimonials like 'I quit smoking using this brand' were all promotional in nature.”
If researchers want to use social media to complement traditional public health surveillance, the computational tools must be improved to eliminate false positives, Danforth said. “We don't want to be basing health policy off promotional material.”