We gave SQL Server 2012 one year to prove itself: What happened?
The pros, the cons and the fail in Microsoft's package
Deep dive I reviewed SQL Server 2012, codenamed Denali, just over a year back and highlighted the major improvements in Microsoft's relational database. After a year in production, was I right, or have other features proved more important in practice?
I'm bound to agree with my last review, so I polled colleagues who also work with SQL Server 2012 on a daily basis. Here's what we felt about SQL Server 2012, and its various components, in practice.
SQL Server Integration Services (SSIS)
Pros Architecturally, SSIS has become simpler and more congruent with SQL Server 2012. The development environment is improved generally and now features a zoom control - indispensable for sprawling tube-map-like packages. It has also become possible to undo and redo changes.
The unit of work in SSIS has become the project, instead of the individual package. This is more sensible conceptually and allows resources (such as connection managers and variables) to be shared between packages.
The outcome of an executed task now shows as a tick or cross; instead of just colouring the task green or red to the confusion of those who are red-green colour blind.
SSIS 2012 has also become much better at consuming flat files of data. A value in a file such as "Mario's Pizzeria" no longer causes severe indigestion, and ragged-right files (where the number of values given in each row differs) are consumed by SSIS with gusto.
Microsoft has also squashed some annoying bugs. One of these fixes is particularly useful to us: the ability to set break points in a script data flow component.
Cons Package execution reports (rendered from history automatically stored within the new SSIS database) are only viewable from within Management Studio rather than being published to Reporting Services.
Microsoft has moved the package configurations menu - and we're not sure why.
Sort is still pointlessly slow and there is no in-memory compression in the data pipeline - if this was enabled like it is in SSAS tabular mode then sorting and aggregating data flow components could run a lot faster and more data could be shoved through a server at any one time without worrying too much about RAM usage.
AlwaysOn Availability Groups
Pros Availability Groups has addressed the two big shortfalls of database mirroring. Firstly, that a database could have only one mirror copy; and secondly, that the mirror copy was unusable until disaster struck.
The copies can now be readable, allowing them to be put to work for reporting queries and backup operations, thus directing load away from busy OLTP databases.
Cons The consistency of distributed transactions across databases hosted within an Availability Group is not guaranteed in the event of failover. Therefore applications requiring cross-database consistency shouldn't use Availability Groups.
Additionally, all Availability Group servers must be in the same Windows domain - a restriction that database mirroring didn't impose.
In practice As part of our preparation for a customer's live system, our R&D team built an experimental set of five SQL 2012 enterprise nodes - featuring quad AMD Opteron 6300-series processors with 48 cores, two ioDrive2 Duo cards providing 2.4TB of flash storage, and 512GB of RAM. This set up tested a 1,790MB database with 100,000 transactions every minute from two different external SSIS 2012 engines across a 1Gbps single-hop switched network sustained for 120 minutes.
Our findings were that, with the availability mode called "synchronous-commit replica" selected, the load on the databases caused the updating process to slow down because the secondary replicas had to acknowledge every transaction. Microsoft has warned about this effect. When you get to five nodes this effect became noticeable but the data was found to be ACID proof at all times.
Using the "asynchronous-commit replica" mode, however, would result in certain data loss with just two nodes running. We would only use this configuration when requiring no commitment on the complete 100 per cent accuracy of the data at all times.
Data Quality Services (DQS)
Pros The well-designed and intuitive user interface quickly allows the user to start cleansing and matching data. This is important given that the tool is intended for regular use by business users (aka data stewards) who may not be overly technical.
The built-in data profiling functionality is very useful when gaining an understanding of the data to be cleansed.
It is a relatively straightforward process to generate valid domain values using knowledge discovery on an existing data set. This means that the tool itself will extract information about which values are valid for given domains, without the need for manual intervention.
The ability to build up the cleansing knowledge base over time by feeding the results of a cleansing project back into the base is very effective. This functionality allows DQS to continually develop knowledge of the data over time and then apply this understanding to the cleansing process.
Cons Sadly there is no API to allow the matching process to be automated.
Each time a DQS matching process is run, a data set is generated that may contain a number of clusters. However, the matching knowledge base retains no history of these clusters, so when a subsequent matching process is run with more recent data, clusters may be shuffled around within the generated data set and assigned different cluster IDs. It would have been very helpful if a "master" set of clustered data could persist across multiple runs to allow clusters to retain the same cluster IDs.
Master Data Services (MDS)
Pros The MDS staging schema is auto-generated by the web-based data-modelling tool. The publishing of subscription views, meanwhile, gives the MDS data steward control over which versions of which entities or derived hierarchies are accessible by the business.
Cons The initial installation of MDS required a significant amount of additional configuration, especially with regards to IIS permissions and Silverlight. The MDS web interface is a little lacking in form and overall usability, especially compared with the DQS user interface. Also on the down side, the MDS add-in for Excel does not feel like a fully realised solution in terms of integrating DQS with MDS.
Pros Power View promised to support a highly interactive data exploration and visualisation tool for users of all levels.
With Excel 2010 and 2013, Power View really does enable Excel users to directly access data using a desktop query and manipulate information in real-time. So Power View enables a major step into opening the SQL data layer to the users via an abstraction layer to interact with the data without involvement of IT staff.
Cons With multiple workstations undertaking PowerPivot queries, the processing of the Power View MDX or DAX was slower than manually generated MDX for the same result set, presumably because the generated code is sub-optimal.
Ultimately, it's clear little tweaks to software can be just as important as the headline-grabbing features in a new release.
SQL Server used to be called a JADB engine: Just Another DataBase. But as far back as SQL Server 7.0 in 1998, Microsoft had the foresight to start introducing business intelligence tools. The Redmond software giant understood that the simple storage of transactions was a given and that the era of analysis was about to start. The BI tools that Microsoft has provided over the years for ETL, OLAP, data mining and so on have evolved dramatically.
What we are now looking at 14 years later with SQL Server 2012 is a mature set of tools, which is perhaps why the greatest benefits are perceived to be in the detail. ®
This article was co authored by Andy Hogg, Gordon Meyer, Andre Vermeulen, Sam Vella and Paul Catterson. Andy is a freelance consultant who specializes in Microsoft SQL Server and associated technologies; Andre is a senior architect for BI and enterprise information management at Steria; Gordon runs a consultancy called Consolidata that focuses on BI using Microsoft technology; Sam is a freelance business intelligence architect and developer; Paul is an applications developer in the data warehouse team for Provident Financial. All are studying or have taken the BI course at Dundee University.
Sponsored: RAID: End of an era?