SQL Server 2008 - from semi-relational to sublime
Inside Microsoft's R2 preview
Review SQL Server 2008 R2 is a step closer to reality.
On the heels of August's first code drop, Microsoft has released a second, more-fully-featured community technology preview (CTP) of its next database server. It promises a number of things, including improved business intelligence through database changes and integration with Office as well as more scalability at the high end.
What does the CTP offer ahead of the final product, planned for the first-half of 2010? Let’s start at the beginning.
In the early days of software development, the typical approach with testing new software was to issue pre-release betas to a small number of people assumed to be technically competent. Beta software was challenging to install. No, that's too kind. Installation was an initiation process.
The CTP process, introduced by Microsoft a few years back, is designed to make early code available to a much broader community. As such, it should be much easier to install. But Microsoft has been painfully slow on the uptake here, as was the case with the original SQL Server 2008 CTP.
SQL Server 2008 R2 CTP initially shows things have changed. Once unpacked, the download shows two files: one called setup, and the other called setup.exe. Do not choose setup.exe. Choose setup. Why? By default, Windows hides common extensions and what appears to be setup.exe is actually a file called setup.exe.config and setup is actually setup.exe. Anyone with any pretensions towards technical savvy will sidestep that trap intuitively, it does begin to feel horribly familiar.
But from that point on, the process is a revelation. The installation program checks that the machine is capable of running the CTP, checks that all the necessary components are present, installs them if not, and generally behaves in a commendable fashion. Microsoft appears finally to understand that a CTP is not a beta and the installation went like a dream.
OK, so it installs easily, but what features does the CTP have? SQL Server generally comes in a number of editions that include Standard, Enterprise, and Express. The CTP is the Enterprise edition and comes with the full set of planned features - or thereabouts.
Reporting has undergone much change, and SQL Server Reporting Services has finally caught up with the spatial data types that arrived with SQL Server 2008. You can now map spatial data. The missing Dundas features are now present, including a range of extra charts and dials.
Opens the door to Office 2010
Microsoft has made much of the self-service Business Intelligence and integration with Office. In order to make best use of the BI features it is definitely worth upgrading to Office 2010, released to beta last week. A beta version is available for download here. Excel 2010 allows much better slicing and dicing of data, and through the SQL Server PowerPivot add-in for Excel that was known as Gemini, users can investigate data to reveal the information hidden therein.
The Report Builder is also much improved and looks much more like one of the Office 2010 family. It has become much easier to split out various components of a report: If you have a grid, a map and a logo in a report, these can be copied to a Report Part Gallery, effectively a library of elements that can be used time and time again.
Also of note is the fact Standard edition now includes backup compression, which had previously only been available in the Enterprise edition of SQL Server.
Somewhat oddly, a restriction appears in R2. The Enterprise edition of SQL Server 2008 supported an unlimited number of virtual machines. In the SQL Server 2008 R2 CTP that's constrained to four. If you want more you'll have to upgrade to the DataCenter edition, which is unlimited.
Master Data Services
Following the acquisition of Stratature, SQL Server 2008 R2 introduces Master Data Services (MDS), which allows the consolidation of data from multiple sources. Imagine, for example, three source systems holding customer lists but none of them has a complete set. MDS makes it much easier to consolidate the three lists into one complete, accurate list of customers.
The complete list can, of course, be fed into a data warehouse but the main reason for using MDS is that it allows the data to be synchronised across the three sources so that each has the same version of the data. If it sounds like your thing, it's there for experimentation in the CTP.
Part of the previous CTP - but with further enhancements in this release - is StreamInsight, Microsoft's technology for handling continuously streaming data. The number of organizations dealing with this type of data is increasing as the use of web logs, RFID tags, telemetric, and other streaming data sources increases. The stock market is one scenario where the technology is being used while another is the monitoring of wind-farm data.
StreamInsight is a data engine that sits in front of SQL Server and can handle incoming transactions at the rate of 15,000 per second. Transactions can be averaged and/or aggregated before being written to SQL Server, say, every five seconds, as a much smaller number of rows.
Complex conditions can be applied that control which data is written to SQL Server. Developers can create Complex Event Processing solutions that monitor and mine incoming data to derive information from the patterns within that data. Existing tools such as Microsoft's Visual Studio and .NET can be used for application development and included with the platform is a range of management features, including a management interface, debugging and diagnostic tools.
Not part of the CTP are two new editions of SQL Server - Datacenter and Parallel Data Warehouse, which it's worth taking a quick look at in relation to Enterprise.
Datacenter will offer very high scalability and is aimed obviously at heavily loaded applications. It will feature virtualization, consolidation, and infrastructure management options. Also, more than eight and up to 256 logical processors, and whatever the operating system can handle in the way of memory.
The Parallel Data Warehouse edition - formerly known as Madison - is a marriage of SQL Server with the technology that underpinned the DATAllegro appliance. This essentially transforms SQL Server into a massively parallel processing (MPP) relational engine. You wouldn't want to run the finance application for a small to medium sized enterprise on this because of the cost and complexity involved. However, if your SQL Server-based data warehouse application has been suffering from poor performance, this should be the version for you once it's released.
Loads and nodes
This solution will comprise multiple physical nodes, each with its own storage, CPU, and memory. Each node will run a SQL Server instance, a configuration that Microsoft is calling Ultra Shared Nothing. Performance is maintained by balancing the load across the nodes and redundancy by mirroring all server and storage components.
The Warehouse edition works with SSRS, SQL Server Integration Services, and SQL Server Analysis Services for integration, reporting, and analysis. It supports star join queries and change data capture, both desirable warehouse features, and scales to a capacity in the tens or hundreds of terabytes. Warehouse edition will only come on hardware from partners Bull, Dell, EMC2, Hewlett-Packard, and IBM.
SQL Server has grown up over the years. From the flaky, semi-relational engine that was SQL Server 6.5 it has progressed to a world-class engine that incorporates excellent BI tools and is increasingly wrapped around with extra features such as MPP, MDM, and reporting services that put it in a class of its own. Not only that, the CTP is finally easy to install. If you are interested in the processing of data, go for it. ®