Data scientists: Do they even exist?
Data data everywhere, but not a drop to shrink
Open ... and Shut Big Data is all the rage. Now if only someone had to clue what to do with it.
According to a new survey of senior executives by Big data consultantancy NewVantage, Big Data is "top of mind for leading industry executives," but these same executives struggle to find the right people to analyse their data. In fact, while 70 per cent of those organisations surveyed plan to hire data scientists, 100 per cent of them said they find it at least "somewhat challenging" to hire competent data scientists:
Given the difficulty in finding qualified people to analyse data, it's perhaps not surprising that only 0.5 per cent of enterprise data gets analysed, according to IDC. But if this is the case, why is Big Data so big?
After all, Gartner expects Big Data to drive $34bn in IT spending in 2013. Some companies, like Sears, clearly "get" Big Data and are putting it to work. But for the unwashed masses of enterprise IT, it sounds like Big Data is an aspiration, not a reality.
Still, it's an aspiration that has hard dollars chasing it. Of the top-10 job skills in demand on Indeed.com's job boards, two of them are Big Data-related. Over time, however, I suspect this data scientist arms race to be absorbed by two other trends:
1. Big Data technology being embedded into applications and
2. Enterprises training existing employees on Big Data technologies rather than hiring data scientists.
On the first trend, Cloudera chief executive Mike Olson perhaps said it best when he argued that the value of big-data technology like Hadoop will increasingly be delivered through applications. Enterprises won't need data scientists as their applications will process and analyse the data for them. Yes, someone will still need to know which questions to ask of the data, but the hard-core science of it should be rendered simpler by applications.
The second trend is equally important, and was called out by Gartner analyst Svetlana Sicular, who posits: "Organisations already have people who know their own data better than mystical data scientists" and that: "Learning Hadoop is easier than learning the company’s business." So the focus of enterprises should be training employees to use tools like Hadoop, not to waste cycles and recruiting fees scouring the planet for mythical data scientists.
All of which should provide some comfort to those organisations that have been struggling to find data scientists to analyse their data. It may turn out that the "mythical data scientist" is actually Lily who works one cubicle over. ®
Matt Asay is vice president of corporate strategy at 10gen, the MongoDB company. Previously he was SVP of business development at Nodeable, which was acquired in October 2012. He was formerly SVP of biz dev at HTML5 start-up Strobe (now part of Facebook) and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register. You can follow him on Twitter @mjasay.
I really do hate the term "big data". Partly because of my pedantry (how can data be big), mainly because it feels like yet another marketting buzz-phrase.
The alternatives don't sound good
While you can train a reasonably competent person to perform some basic analysis, and perhaps use a few key techniques, if you look at a proper course on Data Analysis (such as the one running on Coursera at the moment), you'll see that there are a wide range of techniques, complex statistical underpinnings, and many things that you can do wrong.
If you have someone who knows the business, but only has some training on how to use a few tools, they won't know about the rights and wrongs of data cleaning, various issues that can introduce bias, how to correctly estimate confidence levels, etc. Since you can often make statistics seem to back up a range of conflicting viewpoints just by biasing the selection of data, there's a lot that can go wrong from that viewpoint that assumes that what a proper data analyst has studied is something easily learned from a couple of short training courses.
I say this as a programmer with an interest in data analysis, seeing just how much there is to cover in big data technologies, statistical methods, underlying mathematics, statistical programming languages, reporting standards and more. It's a big subject, and I don't think data scientists can be adequately replaced by an existing employee receiving a little training in Hadoop.
data scientists could combine data in new and stupid ways
We've already been seeing the work of "data scientists" in databases like shopping carts.
People who bought this router also bought:
Are any of these "data scientists" reportedly sought out by industry actually supposed to end up doing anything that a more traditional scientist might regard as science? E.g,. "data R&D", or "data theory", "data experiments", or whatever might make sense ... and what would they be?
I'm not trying to make a point either way, I'm just curious as to what industrial data scientists might actually be expected to do.
I'm an unemployed data expert
I've been transforming data into business information for 18 years saving companies billions. Exactly what these guys are probably looking for, but I'm titles as an 'Informix DBA' not SQL server/Oracle so I'll always be unemployed and not eligible for benefits unlike all the made up cv's from indians (see this so much) who took some meaningless msoft certification or lied their way into a job.
Funny story in that it's great there are so many jobs and people looking for someone like me, gives me some hope of not starving soon.