Microsoft shakes SQL Server 2012's business end at big data
Still holding out for Hadoop
Review Those not in “Denali Denial” will be interested in SQL Server 2012, which has just been unwrapped and is currently being delivered to manufacturers.
In this review (I used Release Candidate 0) I’ll focus on how the new database looks, feels and compares with the existing server. I’ll also take a look at how SQL Server's big data integration stacks up, not only against the current version but also against comparable offerings from rivals.
The good news from SQL Server Management Studio is that the 2012 databases look and feel very much as they did before, so those GUI skills you've acquired as well as your muscle memory of driving the previous product are going to be largely transferable.
The bad news is that the kludgy bits of the 2008 R2 GUI are still with us. When, for example, are we going to get a database diagrammer that actually aligns the end of the joins to the primary and foreign key columns? This is not algorithmically complex, would not affect the muscle memory and would aid readability enormously. Sigh.
SQL Server 2012 comes with SQL Server Developer Tools (SSDT) (codenamed Juneau). It essentially packages together the BIDs tools – SSRS, SSAS and SSRS, which are recognisable as some of the high-end tools formerly available in Visual Studio – with a new service, Database Services.
The database service essentially works with a model of the database. So what? Well, think about the simple (but dangerous!) act of dropping a table. In previous versions of SQL Server you would have to work out the dependencies and deal with them first. Alternatively, you could try to delete the table, see what error messages were thrown up, interpret them and then make decisions about what you needed to do in order to get to a stage where you could delete the table.
Database Services can tell you what dependencies exist; what the likely consequences are (data loss is probably going to be high on the list); and then generate a script to do the work for you should you decide to proceed. Don’t do this at home, kids, and certainly not at work (on a production server).
As with so much here, this description just touches the surface; if you are interested in doing integrated, more controlled database development work in SQL Server, SSDT definitely warrants investigation.
One relatively significant change in the new Microsoft world of BI is that data for analysis can be stored in cubes (as before) but also in tabular form. This has prompted a raft of requirements, technologies and so on, which can hold and perform rapid analysis on tabular data – for example in-memory storage and column storage. This, in its turn, has led Microsoft to talk about the BI Semantic Model (BISM).
BISM is not a product any more than the UDM (Unified Dimensional Model) was a product, it is simply a way of addressing the need to talk about/describe data held in either format. As such I think it is a great idea; but then I liked the UDM as well.
And for the BI end users there is PowerView (formerly known as Project Crescent). Imagine, if you will, a browser-based, Silverlight application that can connect to a besom model based on tabular data held in Analysis Services.
Flavours to savour
The user can drag and drop to their heart’s content, manipulate the data and generate visualisations. I like this too.
SQL Server 2012 comes in various flavours. For BI freaks like me the one of obvious interest is the one called Business Intelligence. Sure enough, this comes with Self-service BI – which is PowerView plus other features. It has Advanced Corporate BI (tabular BISM, advanced analytics and reporting and the VertiPaq in-memory engine). It also comes with the advanced data integration, data quality services and master data services.
That sounds like the full set, right? But what is missing from the BI version are the Data warehousing components: Column Store Index, compression and partitioning. These goodies are only to be found in the Enterprise edition. On the face of this, this is a very odd omission. In a twisted kind of a way it makes sense. Data warehousing is about pulling large volumes of data together from disparate sources, and then cleaning and conforming it. BI is about extracting useful information from a mass of data. So it is possible, logically, to separate them. And if you already have an Oracle data warehouse, and want to analyse the data therein, I guess you only need the BI version of SQL Server. Nevertheless this does seem to me to be a marketing, rather than a technical, distinction.
So what about big data? "Big data" is an expression that has been getting a lot of currency. In short, it is data that doesn’t sit well in neat, well-structured, two-dimensional tables and there is usually quite a bit of it. The open-source community has been successfully holding and manipulating big data in Hadoop Distributed File System (HDFS) and performing analysis using MapReduce.
Microsoft has been working with Hortonworks on a Hadoop-based version for Windows and a service for Windows Azure Microsoft’s cloud platform.
Where does this leave Microsoft and SQL Server?
We don't actually, as yet, have the finished Hadoop bits. Today all we have are bidirectional Hadoop connectors - for SQL Server and the SQL Server Parallel Data Warehouse - which were announced in October and released along with a preview of Hadoop service on Microsoft SQL Azure at the end of 2011. We don't have the really big stuff, Hadoop for Windows, that Microsoft is excited about.
Where does this mean strategically? Microsoft's embrace of Hadoop came just week after Oracle did the same, yet the Oracle Big Data Appliance was launched in January 2012, with Big Data Connectors for integrating data stored in Hadoop and Oracle NoSQL Database. Both Microsoft and Oracle are more or less neck and neck at this point but both are well behind Teradata and IBM who, of the big players, were very early adopters.
IBM’s Hadoop-powered InfoSphere BigInsights has been available since May 2010, with Teradata announcing its Hadoop-using Integrated Analytics a few months later. Of all of these, Teradata’s Aster data approach appeals the most to me. It works around the simple premise that most/all queries against big data yield tabular data and so it works hard to integrate the two querying models to allow querying across both types of data.
However, big data is still in its infancy; it will be a very exciting area for years to come.
All in all, I like SQL Server 2012. I like the BISM and I like PowerView. I think the BI solution that the company offers is better integrated than that of the other major offerings. I like the fact that Microsoft is working actively on big data and I am gobsmacked (but pleased) that it is actually working with open source on this. ®
Mark Whitehorn works as a consultant for national and international companies, specializing in databases, data analysis, data modeling, data warehousing and Business Intelligence (BI). A professor, he holds the chair of analytics at the University of Dundee where he works as an academic researcher, lecturer and runs a masters programme in BI. Mark has been working with BI since 1987.