The next time you look up flu symptoms on Wikipedia, you might be helping experts track the virus's spread. Researchers are reporting success in using Wikipedia traffic -- specifically the traffic on pages related to flu -- to predict the infection trends of a flu season.
The Centers for Disease Control and Protection isn't great at tracking flu trends. Its data, which comes from the reporting of healthcare providers around the country, is always about two weeks behind -- so the agency gets a great picture of the flu season once it's over, but can't see spikes in diagnoses in real-time.
And even when health officials look at the big picture, CDC data only includes flu patients who sought treatment, leaving out the many who suffered through the virus at home. Since the flu takes anywhere from 3,000 to 49,000 U.S. lives each year, catching the actual peaks of infection can make a big difference.
A year ago, the CDC launched a competition to find better flu models, especially those using social media and Internet data. This recent model, led by Kyle Hickman of the Los Alamos National Laboratories, uses an algorithm to link flu-related Wikipedia searches with CDC data from the same time.
Once the researchers taught their algorithm how searches and diagnoses were connected, the model was able to predict the 2013-2014 flu season in real time.
This isn't the first successful use of Wikipedia in flu-tracking. In April, a PLOS Computational Biology study (by a different group of researchers) boasted a Wikipedia-based model. It was more accurate than Google's popular flu trend monitoring, which uses Google searches to predict cases of the flu. Google is generally considered the best real-time alternative to CDC data, but its results can be skewed by media hype: When lots of people are Googling "swine flu" because they've heard it's a threat, they'll trick Google Flu Trends into recording a much higher spike in infection.
It's possible, the PLOS researchers suggested, that people are more likely to go to Wikipedia articles when they're concerned about symptoms they have -- while Google might just be the first place to go when you're looking for news about a possible pandemic.
Neither of these algorithms is perfect, but it probably won't be long before our Web browsing histories are being used to track global disease trends.