We all know that computers are pretty good at crunching numbers. But when it comes to analyzing reams of data and looking for important patterns, humans still come in handy: We're pretty good at figuring out what variables in the data can help us answer particular questions. Now researchers at MIT claim to have designed an algorithm that can beat most humans at that task.
Max Kanter, who created the algorithm as part of his master's thesis at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) along with his advisor Kalyan Veeramachaneni, entered the algorithm into three major big data competitions. In a paper to be presented this week at IEEE International Conference on Data Science and Advanced Analytics, they announced that their "Data Science Machine" has beaten 615 of the 906 human teams it's come up against.
The algorithm didn't get the top score in any of its three competitions. But in two of them, it created models that were 94 percent and 96 percent as accurate as those of the winning teams. In the third, it managed to create a model that was 87 percent as accurate. The algorithm used raw datasets to make models predicting things such as when a student would be most at risk of dropping an online course, or what indicated that a customer during a sale would turn into a repeat buyer.
Kanter and Veeramachaneni's algorithm isn't meant to throw human data scientists out -- at least not anytime soon. But since it seems to do a decent job of approximating human "intuition" with much less time and manpower, they hope it can provide a good benchmark.
"If the Data Science Machine performance is adequate for the purposes of the problem, no further work is necessary," they wrote in the study.
That might not be sufficient for companies relying on intense data analysis to help them increase profits, but it could help answer data-based questions that are being ignored.
“We view the Data Science Machine as a natural complement to human intelligence,” Kanter said in a statement. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”
This post has been updated to clarify that Kalyan Veeramachaneni also contributed to the study.