Although efforts to create an AIDS vaccine have gone on for more than three decades, none has produced a successful result. Strategies that have worked for more than a dozen other pathogens are easily thwarted by the slippery and unpredictable quick-change artist known as human immunodeficiency virus, or HIV.
HIV has evolved into a master of avoiding immune system detection by undergoing spontaneous mutations that effectively make it a moving target. The most common and virulent strain of the virus, HIV-1, has the highest reported mutation rate for any biological system, a study published in PLOS Biology last year discovered. In fact, the number of mutations in a single HIV-infected patient is comparable to all the mutations that have ever occurred in the history of the influenza virus.
But according to a machine learning expert, HIV's mutation-happy defense mechanism also could point to its eventual downfall. David Heckerman, distinguished scientist and senior director of the Genomics Group at Microsoft Research, is combining his expertise in computer science with his background as a medical doctor to apply machine learning to the creation of an AIDS vaccine.
In the 1990s, Heckerman invented the spam filter by creating computer programs that were able to learn and make predictions about spam vs. other email. Now he's using some of those techniques — along with thousands of computers — to analyze HIV's many mutations to pick out ones that spell death for the virus. His hope is to create a vaccine that teaches the body's immune system to target these vulnerable fragments of HIV that can't escape through mutation.
We spoke to Heckerman about the similarities between fighting spammers and HIV, how he discovered the virus's weak link and his plans for an AIDS vaccine.
Q: Earlier in your career, you used machine learning to invent the spam filter — now, of course, a key component of all email systems. Describe what you had to do in order to thwart spammers.
I invented the spam filter back in 1997, which uses machine learning. Basically, you show a computer examples of spam mail and non-spam mail, and then you apply machine learning algorithms to predict whether an incoming message is spam or not. We deployed the spam filter at Microsoft, and we noticed something interesting happened.
First, our spam filter would pick up on certain words, like "Viagra." But spammers would catch on to this and replace the last “a” with an at-sign (“Viagr@”) — to a human, it looks like the same word, but to a computer program, it doesn't. As a result, these messages were getting through our spam filter. We would change our spam filter to catch the at sign at the end, and spammers would just do something else, like embed the word in a bitmap image.
We were going back and forth like this for a long time, but eventually we stepped back to be more strategic. We realized we should go after the weak link of spammers, which was their need to extract money. We basically scanned the Internet for the websites that were collecting money, built a catalogue of those sites, and if a message had a link to one of these pages, we would increase the probability that it was spam. This is back when people weren't buying much on the Internet, so this spam filter worked really well because one of the main things that was being bought online was spam-related. It made it a very telltale signature that the message that you were receiving was spam.
Q: How does your work with spam filters relate to the seemingly unrelated task of finding an HIV vaccine?
I started getting interested in this problem with HIV, and it turns out there's a very close analogy here: Spammers mutate their spam messages to work around our filters, and HIV mutates itself to avoid attack by our immune system. Proteins are strings of amino acids that fold up and form little machines that do their good or bad things — in the case of HIV, it's bad things. HIV can mutate those proteins very robustly. It's amazing how many mutations HIV can withstand and still survive, be able to replicate and do its damage.
As with spammers, we were thinking that there has to be a weak link somewhere. There has to be some regions of the protein that when they mutate, they're going to really hurt HIV, maybe even kill HIV. So we started using machine learning in two different ways to find these weak links. In one way, I collaborated with Bruce Walker at the Ragon Institute, who had a cohort of people who were infected with HIV but not severely sickened by the virus. They're called HIV controllers. Using machine learning, we compared the places where their immune systems were attacking HIV to those of non-controllers. Sure enough, we found some regions where there were differences in where these attacks were happening, which suggested that they might be vulnerable to attack
Then we used another machine learning technique where we simulated the physical properties of the protein after they underwent mutations at various sites, and we were able to see which mutations destabilized the protein. These two different techniques of comparing controllers and non-controllers and running the physics simulations of the proteins themselves actually yielded roughly the same answers. They pointed to the same regions of HIV that are vulnerable.
Q: What are your next steps in applying this knowledge about the vulnerabilities of HIV to one day treat patients?
The next step is to design a vaccine that you could give to an individual that would teach their immune system to attack at those vulnerable regions instead of random places. Several known mechanisms may work. We are exploring which ones will be most safe and effective.