Microsoft says scale-out storage not needed for big data
SQL Server guru also thinks data scientists also have limited role … for now
Infrastructure vendors’ vision of big data rigs based on scale-out NAS won’t come to fruition, according to the Microsoft executive heading the company’s big data push for SQL Server 2012.
David Campbell, a Microsoft Technical Fellow in the Data & Storage Platform Group claims personal responsibility for Microsoft’s adoption of Hadoop as both an app on the Windows platform and Azure. In conversation with The Register he declared himself “Diametrically opposed” to big data rigs built on scale-out storage. To explain his reason he challenged your correspondent to “Do a census of big data implementations. Ask how many are built on off-the-shelf products and how many are built on scale-out storage.”
All the examples Campbell could list were built on commodity hardware. “The evidence is there if you look,” he said, asserting that the market has already decided that a RAIS – redundant array of independent servers – approach typified by the way containerised data centres operate is superior to the somewhat exotic hardware involved in scale-out NAS or dedicated analytical appliances.
“I used to think the big online players were outliers,” he added. Now he thinks they got the approach to big data right the first time.
Campbell is also cool on the role of the data scientist, analytical experts who blend hard-core data-crunching skills with an understanding of moving bits at scale and can then translate their efforts into business insights. Such workers are in very short supply, he says, and industry cannot assume that the crop currently taking up the first university courses in the discipline will reach the workplace in the next five years.
Microsoft has therefore tooled SQL Server 2012 so it can satisfy a data scientist’s darkest big data desires, then pass the results of their efforts to lesser folk in IT and around the business.
“People ask me if they need to hire a data scientist,” Campbell asked a Microsoft event in Sydney today. “I say that if they can connect their people to the output a data scientist creates, maybe not all the work needs to be done in-house.” To help things along, SQL Server 2012 therefore includes Power View, a new data visualisation too which makes it easy - with the help of Excel - to turn data into something the average executive can understand.
Campbell is also optimistic that, over time, more suit-wearing types will be happy to drive tools like Power View, as "milennials" fluent in Excel syntax enter the workforce and start to crunch their own data. ®
Course you don't.
640K is all you'll ever need.
Fluent in Excel
Yes, writing our "big data" apps in Excel is a terrific idea. Proprietary, unreadable, very difficult to debug, unmaintainable ... it's nearly the perfect language. And the list of Excel-based disasters in finance, genomics, etc shows just how well it does in practice.
Having those apps written by new grads with little or no real-world experience will make it even better. (My assumption is that most of the ones who do have experience, thanks to co-op programs, internships, and the like, will get better jobs.)
We'd be better off with FORTRAN '77. Hell, we'd be better off if these SQL-Server-based visualization apps were written purely in T-SQL.
turning data into something the average executive can understand
Just run the printouts through a shredder, boil it up and call it oatmeal.