Because the CDC's data collection from hospitals and physicians involves a time lag, a system that uses Twitter might be able to reveal a spike in flu cases more quickly, said David Broniatowski, an assistant professor in the George Washington University's Department of Engineering Management and Systems Engineering. Broniatowski did much of the research with a team of colleagues while he was at Johns Hopkins.
"We’re actually able to track at the municipal level," he said. "We can provide them with the data about what the flu is like in their city. That allows them to do surge planning."
The main problem with using Google searches or tweets to determine flu incidence is that people use both to discuss the flu, rather than just complain about symptoms and exposure or mentioning medication, especially after news coverage of the flu. Broniatowski said his researchers developed an algorithm that separated chatter from useful information.
They did that by putting 10,000 tweets on Amazon Mechanical Turk, and paying people to determine whether each tweet was an actual complaint about exposure to the flu rather than just a reference to it. They then applied the algorithm to a much larger group of tweets. (Disclosure: Amazon CEO Jeff Bezos owns the Washington Post).
It's also possible to determine a tweet sender's location much of the time.
"Real-time tools such as our system," the researchers wrote, "have the potential to enable clinicians to anticipate the need for surges in influenza-like illness up to two weeks in advance of existing data collection strategies. Early knowledge of an upward trend in disease prevalence can inform patient capacity preparations and increased efforts to distribute the appropriate vaccine or other treatment."
Still unclear, Broniatowski said, is whether a Twitter-based method will prove accurate in rural areas, where fewer people use it.