Despite all the talk about companies using big data to uncover insights, maybe automation is the real reason the world is so excited about big data. What makes the big data era so significant isn’t that people are using data to inform their decisions, but that there’s just too much data of too many different types. In many cases, keeping up isn’t so much a matter of changing mindsets as it is about getting better tools.
Last week, New York Times reporter Steve Lohr wrote about the possibility of a big data bubble forming because people rely too much on data at the expense of experience and intuition. It got me thinking about all the technologies and algorithms I’ve covered, about all the discussions I’ve had about why a data scientist is more than just a statistician who can write MapReduce jobs. Nearly everywhere, it seems to me (save for, as Lohr cites, unique uses such as algorithmic trading), big data really is less about replacing human intuition than it is about augmenting the human experience by making it easier, faster and more efficient.
Like the purpose-built robots that have revolutionized manufacturing, today’s methods for processing and analyzing data are fast, scalable and precise, but they don’t yet (in most cases) make our decisions. Big data can make life and business a lot more efficient, but for the time being, human judgment and willpower are still very much in control.
Offloading grunt work to the machines
We’ve recently covered some obvious examples of this. Take, for example, recent university research demonstrating how media researchers could use machine learning and natural-language processing to save themselves the work of manually reading and coding every piece of text they wish to analyze as part of a study. Algorithms — like robots in manufacturing — are doing the mindless, repetitive tasks of discerning subject matter, keywords and sentiment, but researchers are still the ones poring over those results and telling us what it all means.
A couple months ago, I spoke with Recommind CEO Bob Tennant about how attorneys are using software to pore through terabytes worth of electronic documents during the discovery process. Predictive coding, as it’s called, frees them up to focus more on case strategy than on the tedium of analyzing every single PDF and email message to figure out if it’s relevant to a case. However, he noted, although the software typically does a better job than a person alone would do, most law firms still use a hybrid man-machine approach to leverage the strengths of both and ensure nothing gets missed. And the software certainly doesn’t assess a document’s relative legal relevance in light of a case’s facts and craft an argument around it.
Even software products such as BeyondCore, which aim to minimize human involvement in the data analysis process as much as possible, are actually just about making business people more efficient. In this case, people are only integral to the first and final steps — selecting the metric with which they’re concerned and then interpreting the statistical correlations, respectively. The messy middle step of asking the right questions is (in theory) eliminated by software that analyzes all the possible correlations and scores and presents them accordingly.