But perhaps a better term would be “data artists,” befitting the artfulness that goes into interpreting and illustrating big datasets. Perhaps these scientists are not the Einsteins and Edisons but the Van Goghs and Picassos of the big data revolution.
The current revolution in big data is inherent in the name: datasets trending to petabyte size (that’s 1 million GBs) or more. And it’s all about complexity: unstructured data, multiple datasets, and structured data, all jumbled up. So big data requires big brains and massive computing brawn.
But it requires more, too, much more, according to author and data scientist Bill Franks.
According to Franks, even more than size and complexity, big data is increasingly about creativity: finding interesting patterns and following them down the rabbit hole.
Sometimes, nothing turns up. Other times, data artists discover wonderland. A very profitable wonderland.
“You will not find an agreed-upon definition for ‘big data,’” Franks says. “I’ve had vigorous debates about that.”
The general guidelines he offers are a combination of factors: volume, size, variety, and complexity. One of the most interesting and relevant is complexity. Big data doesn’t always fit into neat columns. Some data works in standard traditional SQL (structured query language) databases, and some does not.
For example, web analytics data from server logs is fairly structured, but tweets or Instagram photos are not. Sensor data reporting customer traffic in a retail store may be neat and clean, but data artists will probably want to match it up against an entirely different dataset: sales.
And then factor in weather data, and time of year, and the price of gasoline. Probably not the position of Pluto, but in big data you never really know what bit of information might prove to be critical.
“There have always been individuals who take a company’s data and find interesting patterns using data-mining and predictive analytics,” says Franks. “The technical abilities to do so are now table stakes to be successful in a business environment.”
But now in the era of big data, these technical abilities need to be married with softer skills: commitment, and especially creativity.
So the key to being a good data artist, Franks suggests, is creativity.
“There’s no need to be creative if you have the exact data for the exact problem … but the reality is that you have to make assumptions, deal with inconsistencies, and then choose a model that may not be perfect.”
That’s what takes creativity, and the best data scientists are, in fact, artists.
The tools of their trade, of course, are not palette, brush, and canvas. And increasingly they are not traditional databases and SQL, but tools like Hadoop and MapReduce, tools for processing huge amounts of data across large number of machines. Or hand-coding spur-of-the-moment solutions in Java or other languages.
The business results are clear. Better products, better customer service, better asset utilization: All results of using big data.
Walmart’s rise over the last few decades was largely a technology-powered revolution in retail. Google’s capability to be Google is entirely dependent on the company’s success to acquire and manage massive sets of data. The same is true of Facebook’s capability to keep almost a billion people connected.
But it’s not just about the big names in the technology world. Airlines scheduling flights and setting prices use big data to maximize profitability. Banks stocking ATMs with cash manage datasets on seasons and events in addition to their own customer data. And Teradata now has 36 members of its “Petabyte Club” — clients who have at least one but often dozens or scores of petabytes of data — up from just five in 2007.
“Big data is so much in the news lately that some might argue it’s overhyped,” Franks says. “But so many people are getting excited … they can’t all be wrong. And the new value that can be driven is just really cool.”
Franks sums it the job of a data artist this way:
“You need to use imperfect data and imperfect methods in an insufficient timeframe to get enough data to make a good business decision.”
The end result? Green on the balance sheet.
Copyright 2012, VentureBeat