Microsoft shakes SQL Server 2012's business end at big data
Still holding out for Hadoop
Flavours to savour
The user can drag and drop to their heart’s content, manipulate the data and generate visualisations. I like this too.
SQL Server 2012 comes in various flavours. For BI freaks like me the one of obvious interest is the one called Business Intelligence. Sure enough, this comes with Self-service BI – which is PowerView plus other features. It has Advanced Corporate BI (tabular BISM, advanced analytics and reporting and the VertiPaq in-memory engine). It also comes with the advanced data integration, data quality services and master data services.
That sounds like the full set, right? But what is missing from the BI version are the Data warehousing components: Column Store Index, compression and partitioning. These goodies are only to be found in the Enterprise edition. On the face of this, this is a very odd omission. In a twisted kind of a way it makes sense. Data warehousing is about pulling large volumes of data together from disparate sources, and then cleaning and conforming it. BI is about extracting useful information from a mass of data. So it is possible, logically, to separate them. And if you already have an Oracle data warehouse, and want to analyse the data therein, I guess you only need the BI version of SQL Server. Nevertheless this does seem to me to be a marketing, rather than a technical, distinction.
So what about big data? "Big data" is an expression that has been getting a lot of currency. In short, it is data that doesn’t sit well in neat, well-structured, two-dimensional tables and there is usually quite a bit of it. The open-source community has been successfully holding and manipulating big data in Hadoop Distributed File System (HDFS) and performing analysis using MapReduce.
Microsoft has been working with Hortonworks on a Hadoop-based version for Windows and a service for Windows Azure Microsoft’s cloud platform.
Where does this leave Microsoft and SQL Server?
We don't actually, as yet, have the finished Hadoop bits. Today all we have are bidirectional Hadoop connectors - for SQL Server and the SQL Server Parallel Data Warehouse - which were announced in October and released along with a preview of Hadoop service on Microsoft SQL Azure at the end of 2011. We don't have the really big stuff, Hadoop for Windows, that Microsoft is excited about.
Where does this mean strategically? Microsoft's embrace of Hadoop came just week after Oracle did the same, yet the Oracle Big Data Appliance was launched in January 2012, with Big Data Connectors for integrating data stored in Hadoop and Oracle NoSQL Database. Both Microsoft and Oracle are more or less neck and neck at this point but both are well behind Teradata and IBM who, of the big players, were very early adopters.
IBM’s Hadoop-powered InfoSphere BigInsights has been available since May 2010, with Teradata announcing its Hadoop-using Integrated Analytics a few months later. Of all of these, Teradata’s Aster data approach appeals the most to me. It works around the simple premise that most/all queries against big data yield tabular data and so it works hard to integrate the two querying models to allow querying across both types of data.
However, big data is still in its infancy; it will be a very exciting area for years to come.
All in all, I like SQL Server 2012. I like the BISM and I like PowerView. I think the BI solution that the company offers is better integrated than that of the other major offerings. I like the fact that Microsoft is working actively on big data and I am gobsmacked (but pleased) that it is actually working with open source on this. ®
Mark Whitehorn works as a consultant for national and international companies, specializing in databases, data analysis, data modeling, data warehousing and Business Intelligence (BI). A professor, he holds the chair of analytics at the University of Dundee where he works as an academic researcher, lecturer and runs a masters programme in BI. Mark has been working with BI since 1987.